Unit1_math 4
Unit1_math 4
Greater Noida
Statistical Technique-I
Unit: I
• Unit V: Time & Work, Pipe & Cistern, Time, Speed & Distance, Boat &
Stream, Sitting arrangement, Analogy.
Data Analysis
Artificial intelligence
Network and Traffic modeling
PSO1 The ability to identify, analyze real world problems and design
their ethical solutions using artificial intelligence, robotics,
virtual/augmented reality, data analytics, block chain technology,
and cloud computing
PSO2 The ability to design and develop the hardware sensor devices
and related interfacing software systems for solving complex
engineering problems.
PSO3 The ability to understand inter disciplinary computing techniques
and to apply them in the design of advanced computing.
PSO4 The ability to conduct investigation of complex problem with the
help of technical, managerial, leadership qualities, and modern
engineering tools provided by industry sponsored laboratories.
S. Course PO PO PO PO PO PO PO PO PO PO PO PO
No Outcome 1 2 3 4 5 6 7 8 9 10 11 12
BAS0402.
1 1
3 3 3 3 2 - 1 2 2 1 2 2
BAS0402.
2 2
3 3 3 3 2 2 1 2 2 1 2 2
BAS0402.
3 3 3 3 2 2 2 - - - - - - 1
BAS0402.
4 3 3 3 3 2 - - 2 2 1 - 2
4
BAS0402.
5 3 3 3 3 2 3 2 - 3 3 2 3
5
6 Average 3.0 3.0 2.8 2.8 2.0 1.0 0.8 1.2 1.8 1.2 1.2 2.0
• PSO1: Identify, analyze real world problems and design their ethical
solutions using artificial intelligence, robotics, virtual/augmented reality,
data analytics, block chain technology, and cloud computing.
• PSO2: Design and develop the hardware sensor devices and related
interfacing software systems for solving complex engineering problems.
• PSO3: Understand inter-disciplinary computing techniques and to apply
them in the design of advanced computing.
• PSO4: Conduct investigation of complex problems with the help of
technical, managerial, leadership qualities, and modern engineering tools
provided by industry-sponsored laboratories.
IOT IV A 49 45 91.83%
Unit 1 https://archive.nptel.ac.in/courses/111/105/111105042/
https://archive.nptel.ac.in/courses/110/107/110107114/
Unit 2 https://archive.nptel.ac.in/courses/103/106/103106120/
Unit 3 https://archive.nptel.ac.in/courses/117/105/117105085/
Unit 4 https://archive.nptel.ac.in/courses/111/104/111104032/
Unit 5 https://www.youtube.com/watch?v=KZ_M5RWaP6A
https://www.youtube.com/watch?v=WP4jsNRgfa4
https://www.youtube.com/watch?v=jPaQDKbahU8
https://www.youtube.com/watch?v=FwiWJLicakg
SECTION – A CO
1. Attempt all parts- [10×1=10]
• Introduction
• Measures of central tendency: Mean, Median, Mode.
• Moment
• Skewness
• Kurtosis
• Curve Fitting
• Method of least squares
• Fitting of straight lines
• Fitting of second degree parabola
• Exponential curves
• Correlation and Rank correlation,
• Linear regression
• Nonlinear regression
• Multiple linear regression
Arithmetic Mean
Definition
Arithmetic mean of a set of observations is their sum divided by the
number of observations, e.g., the arithmetic mean x¯ of n observations
x1, x2, ..., xn is given by:
By using formula
Median:
Definition: Median of a distribution is the value of the variable which
divides it into two equal parts.
It is the value such that the number of observations above it is equal
to the number of observations below it. The median is thus a
positional average.
Ungrouped Data:
•If the number of observations is odd then median is the middle value
after the values have been arranged in ascending or descending order
of magnitude.
•In case of even number of observations, there are two middle terms
and median is obtained by taking the arithmetic mean of middle
terms.
Example
1. Median of Values 25, 20, 15, 35, 18. Median: 20
2. Median of Values 8, 20, 50, 25, 15, 30. Median: 22.5
Discrete Frequency Distribution
In this case median is obtained by considering the cumulative
frequencies. The steps involved
i. Find , where N =
ii. See the cumulative frequency (c.f.) just greater than .
iii. corresponding value of x is median.
Mode:
• Mode is the value which occurs most frequently in a set of
observations and around which the other items of the set cluster
densely.
• It is the point of maximum frequency or the point of greatest
density.
• In other words the mode or modal value of the distribution is that
value of the variate for which frequency is maximum.
Calculation of Mode
In case of discrete distribution: Mode is the value of x
corresponding to maximum frequency but in any one (or more)of
the following cases.
where is the lower limit,width and the frequency of the model class
are the frequencies of the classes preceding and succeeding the modal
class respectively. While applying the above formula it is necessary to
see that the class intervals are of the same size.
1 2 3 4 5 6
4 2 7
5 5 13
6 8 17 15
7 9 21 22 29
8 12 26 35
9 14 28 40 43
10 14 29 40
11 15 26 39
12 11 24
13 13
1 max.15 11
2max 29 10, 11
3 max 28 9, 10
4 max 40 10, 11, 12
5 max 40 8 9 10
6 max 43 9 10 11
Mode
Q.1 Calculate the mean, median and mode of the following data-
Wages (in Rs) 0-20 20-40 40-60 60-80 80-100 100-120 120-140
No. of 6 8 10 12 6 5 3
Workers
Moments
• In mathematical statistics it involve a basic calculation. These
calculations can be used to find a probability distribution's mean,
variance, and skewness.
∑ 𝑓 𝑖 ( 𝑥𝑖 − 𝑥 )𝑟
𝜇 𝑟 = 𝑖 =1 ; r =0 ,1 , 2 … .
𝑁
where
in particular
Note. In case of a frequency distribution with class intervals, the values
of are the midpoints of the intervals.
Example 1.Find the first four moments for the following individual
series.
Solution: Calculation of Moments
3 6 8 10 18
Now ==9
==,
==,
==2,
==,
Where,
For
For
For
For and so on.
In Calculation work, if we find that there is some common factor (>1)
in values of we can ease our calculation work by defining
In that case , we have
Where,
For
For
For and so on.
relations:
(coefficients)
(coefficients)
Example 3:Calculate the variance and third central moment from the
following data.
0 1 2 3 4 5 6 7 8
1 9 26 59 72 52 29 7 1
Solution: Calculation of Moments
0 1 -4 -4 16 -64
1 9 -3 -27 81 -243
2 26 -2 -52 104 -208
3 59 -1 -59 59 -59
4 72 0 0 0 0
1.9805
Variance=1.97975
Also
=0.0178997
Third central moment
Skewness
• It tells us whether the distribution is normal or not
• It gives us an idea about the nature and degree of concentration of
observations about the mean
• The empirical relation of mean, median and mode are based on a
moderately skewed distribution
Skewness:
•I t means lack of symmetry.
•It gives us an idea about the shape of the curve which we can draw with the
help of the given data.
•A distribution is said to be skewed if—
Mean, median and mode fall at different points, i.e.,
Mean ƒ= Median ƒ= Mode;
•Quartiles are not equidistant from median; and
•The curve drawn with the help of the given data is not symmetrical but
stretched more to one side than to the other.
Symmetrical Distribution
A symmetric distribution is a type of distribution where the left side of
the distribution mirrors the right side. In a symmetric distribution, the
mean, mode and median all fall at the same point.
Measures o f Skewness:
The measures of skewness are:
•Sk = M − Md ,
•Sk = M − Mo ,
•Sk = (Q3 − Md ) − (Md − Q1 ),
where M is the mean, Md , the median, Mo , the mode, Q1 , the first quartile
deviation and Q3 , the third quartile deviation of the distribution.
These are the absolute measures of skewness.
•C o e f f i c i e n t s o f S k e w n e s s : For comparing two series we do
not calculate these absolute measures but we calculate the relative measures
called the coefficients of skewness which are pure numbers independent of
units of measurement.
Pearson’s a n d C o e f f i c i e n t s :
Kurtosis
• Describe the concepts of kurtosis
• Explain the different measures of kurtosis
• Explain how kurtosis describe the shape of a distribution.
Kurtosis
•If we know the measures of central tendency, dispersion and skewness, we
still cannot form a complete idea about the distribution. Let us consider
the figure in which all the three curves
•A, B, and C are symmetrical about the mean and have the same range.
• Curve of the type A which is neither flat nor peaked is called the normal
curve or mesokurtic curve and for such curve = 3, i.e., γ2 = 0.
• Curve of the type B which is flatter than the normal curve is known as
platycurtic curve and for such curve < 3, i.e., γ2 < 0.
Curve of the type C which is more peaked than the normal curve is called leptokurtic
curve and for such curve > 3, i.e., γ2 > 0.
Q2. For a distribution, the mean is 10, variance is 16, γ1 is +1 and is 4. Comment
about the nature of distribution. Also find third central moment.
S o l u t i o n =64, =16,
Example 3 The first four moment about the working mean 28.5 of a distribution
are 0.294,7.144,42.409 and 454.98. Calculate the first four moment about mean.
Also evaluate and and comment upon the skewness and kurtosis of the
distribution.
Solution:, Moment about mean
Skewness : is positive
Kurtosis: so distribution is leptokutic.
Q1. Find all four central moments and Discuss Skewness and Kurtosis
for the following distribution-
Q1. The First four moments of a distribution about are Find the first
four moments about mean. Discuss the Skewness and Kurtosis and also
comment upon the nature of the distribution.
Q2. Define the Mode and calculate Mode for the distribution of
monthly rent Paid by Libraries in Karnataka
Monthly rent 500-1000 1000-1500 1500-2000 2000-2500 2500-3000 3000 & above
No.of Library 5 10 8 16 14 12
Q3. Write Short Note on
i. Range ii. Inter quartile range iii. Mean deviation iv. Standard
deviation v. Variance
Q 4. Explain the measures of dispersion and also find the range &
Coefficient of Range for the following data: 20, 35, 25, 30, 15.
Moments
Relation between
Relation between
Skewness
Kurtosis
Curve Fitting
• The objective of curve fitting is to find the parameters of a
mathematical model that describes a set of data in a way that
minimizes the difference between the model and the data.
Sol. Let the straight line obtained from the given data be
(1)
then the normal equations are
(2)
(3) m=5
Solving we get
Required lines is
Where ,
The normal equation for (1) are
Where ,
The normal equation to (1) are
Example Use the method of least squares to the fit the curve:
to the following table of values:
so we have
0 1 2 3 4
1 0 3 10 21
Moments
Relation between
Relation between
Skewness & kurtosis
Curve fitting
Correlation
• Identify the direction and strength of a correlation between two factors.
• Compute and interpret the Pearson correlation coefficient and test for
significance.
• Compute and interpret the coefficient of determination.
• Compute and interpret the Spearman correlation coefficient and test for
significance.
Negative Correlation:
•If the two variables deviate in the opposite directions, i.e., if increase (or
decrease) in one results in corresponding decrease (or increase) in the other,
correlation is said to be diverse or negative.
• For example, the correlation between (i) the price and demand of a
commodity, and (ii) the volume and pressure of a perfect gas; is
negative.
Perfect Correlation:
•Correlation is said to be perfect if the deviation in one variable is followed
by a corresponding and proportional deviation in the other.
Correlation Coefficient:
•The correlation coefficient due to Karl Pearson is defined as a measure of
intensity or degree of linear relationship between two variables.
• Karl Pearson’s Correlation Coefficient
•Karl Pearson’s correlation coefficient between two variables X and Y , is
denoted by r (X, Y ) or rXY , is a measure of linear relationship between
them and is defined as:
• r (X, Y ) =
•f (xi, yi ); i = 1, 2, ..., n is the bivariate distribution, then
Or
Here is the no. of pairs of values of
Note: Correlation co efficient is independent of change of origin and
scale.
Let us define two new variables
where
Then
1 8 1 64 8
3 12 9 144 36
5 15 25 225 75
7 17 49 289 119
8 18 64 324 144
10 20 100 400 200
10 14 18 22 26 30
Solution: Let
18 12 24 6 30 36
10 18 -3 -1 9 1 3
14 12 -2 -2 4 4 4
18 24 -1 0 1 0 0
22 6 0 -3 0 9 0
26 30 1 1 1 1 1
30 36 2 2 4 4 4
Total
Hence,n=6,
Then
Marks 18 19 20 21 total
10-20 4 2 2 8
20-30 5 4 6 4 19
30-40 6 8 10 11 35
40-50 4 4 6 8 22
50-60 2 4 4 10
60-70 2 3 1 6
Total 19 22 31 28 100
x 18 19 20 21 f
y
15 10-20 4 2 2 8 -3 -24 72 30
25 20-30 5 4 6 4 19 -2 -38 76 20
35 30-40 6 8 10 11 35 -1 -35 35 9
45 40-50 4 4 6 8 22 0 0 0 0
55 50-60 2 4 4 10 1 10 10 2
65 60-70 2 3 1 6 2 12 24 -2
19 22 31 28 100 total -75 217 59
-2 -1 0 1 Total
-38 -22 0 28 -32
76 22 0 28 126
56 16 0 -13 59
RANK CORRELATION:
Definition: Assuming that no two individuals are bracketed equal in either
classification, each of the variables X and Y takes the values 1, 2, ..., n.
Hence, the rank correlation coefficient between A and Bis denoted by r,
and is given as:
Person A B C D E F G H I J
Rank in 9 10 6 5 7 2 4 8 1 3
maths
Rank in 1 2 3 4 5 6 7 8 9 10
physics
Person D=
A 9 1 8 64
B 10 2 8 64
C 6 3 3 9
D 5 4 1 1
E 7 5 2 4
F 2 6 -4 16
G 4 7 -3 9
H 8 8 0 0
I 1 9 -8 64
J 3 10 -7 49
Uses:
• It is used for finding correlation coefficient if we are dealing with
qualitative characteristics which cannot be measured quantitatively but
can be arranged serially.
• It can also be used where actual data are given.
• In case of extreme observations, Spearman’s formula is preferred to
Pearson’s formula.
Limitations:
• It is not applicable in the case of bivariate frequency distribution.
• For n > 30, this formula should not be used unless the ranks are given,
since in the contrary case the calculations are quite time-consuming.
68 64 75 50 64 80 75 40 55 64
62 58 68 45 81 60 68 48 50 70
75 68 64 75 50 64 80 75 40 55 64 Total
62 58 68 45 81 60 68 48 50 70
Ranks in 4 6 2.5 9 6 1 2.5 10 8 6
Ranks in 5 7 3.5 10 1 6 3.5 9 8 2
-1 -1 -1 -1 5 -5 -1 1 0 4 0
1 1 1 1 25 25 1 1 0 16 72
75 2 times
64 3 times
68 2 times
Q1. Find the rank correlation coefficient for the following data:
23 27 28 28 29 30 31 33 35 36
18 20 22 27 21 29 27 29 28 29
Correlation
Karl Pearson coefficient of correlation
Rank Correlation
Tied Rank
Regression
• Explanation of the variation in the dependent variable, based on the
variation in independent variables and Predict the values of the
dependent variable.
REGRESSION ANALYSIS:
• Regression measures the nature and extent of
correlation .Regression is the estimation or prediction of unknown
values of one variable from known values of another variable.
Difference between curve fitting and regression analysis: The only
fundamental difference, if any between problems of curve fitting and
regression is that in regression, any of the variables may be considered
as independent or dependent while in curve fitting, one variable cannot
be dependent.
Curve of regression and regression equation:
• If two variates are correlated i.e., there exists an association or
relationship between them, then the scatter diagram
will be more or less concentrated round a curve. This curve is called the
curve of regression and the relationship is said to be expressed by
means of curvilinear regression.
• The mathematical equation of the regression curve is called
regression equation.
LINEAR REGRESSION:
• When the point of the scatter diagram concentrated round a straight
line, the regression is called linear and this straight line is known as
the line of regression.
• Regression will be called non-linear if there exists a relationship
other than a straight line between the variables under consideration.
LINES OF REGRESSION:
Let ----.(1)
be the equation of regression line of
(2)
(3)
Solving (2) and (3) for ‘’ and ‘’ we get.
…..(4)
(5)
Eqt.(5) given
Hence line passes through point
Putting in equation ,we get
………(6)
Eqt.(6) is called regression line of is called the regression coefficient of
and is usually denoted by
Let 1,then
Since .
Similarly if
Property 3.Airthmetic mean of regression coefficient is greater than
the Correlation coefficient.
Proof. We have to prove that
r+r
which is true.
Property 4:Regression coefficients are independent of the origin but
not of scale.
Proof. Let
Similarly, ,
Thus and are both independent of a and b but not of
Regression line of
=……..(3)
=0.768….(4)
Multiply equations(3) and (4) we get
1000
Solving the above equation ,we get =6, =1
Coefficient of variability of
Coefficient of variability of
Required ratio=
Non-linear Regression:
Let
Be a second degree parabolic curve of regression of on
1 2 3 4
12 18 24 30
0 1 2 3
Sol. Let
be determined by following equations.
156=6a+20b+14c
Q1. Fit a straight line trend by the method of least square to the
following data:
Year 1979 1980 1981 1982 1983 1984
5 7 9 10 12 17
Production
X 1 2 3 4 5
Y 2 4 5 3 6
Q4. Fit a straight line trend by the method of least squares to the
following data: -
Year 2012 2013 2014 2015 2016 2017
Sales of T.V. 7 10 12 14 17 24
sets (in’000)
Q3. Sum of squares of items 2430, mean is 7 N=12, find the variance.
i. 176.5
ii. 12.38
iii. 153.26
iv. 14
Q4. Calculate the standard variation of the following
9, 8, 6,5,8,6
v. 2
vi. 3
vii. 1.414
viii. 2.414
a. r(x,y)
Q1 Obtain normal equation by method of least square to the curve .Fit it to the
following data:
0.1 0.2 0.4 0.5 1 2
21 11 7 6 5 6
Q2. Find the multiple linear regressions of on and from the data relating to
three variables:
7 12 17 20
4 7 9 12
1 2 5 8
Q3. If is the angle between the two line of regression.then express in terms of
correlation coefficient. Explain the significance when and
Q4. Two lines of regression are given by and and =16.Calculate-(i) the
mean of and (ii) S.D. of (iii) the correlation coefficient.
•
Q6. The first four moments of a distribution about 2 are 1,2.5,5.5 and
16 resp.Calculate the four moments about mean and about the origin.
Text Books
• Erwin Kreyszig, Advanced Engineering Mathematics, 9thEdition,
John Wiley & Sons, 2006.
Reference Books
• B.S. Grewal, Higher Engineering Mathematics, Khanna Publishers,
35th Edition, 2000. 2.T.Veerarajan : Engineering Mathematics (for
semester III), Tata McGraw-Hill, New Delhi.