Basic Statistics and Probability
Basic Statistics and Probability
probability
Prepared by Ranju Mohan, PhD student, CE dept, IITM
For CE302 class by Gitakrishnan Ramadurai, AP, CE dept, IITM
Statistics:
Mean, Median,
Mode etc
Standard deviation,
Variance, Range,
Quartiles etc
Working with data…..
Statistical estimate
POPULATION +
Inferential statistics
Sample Concepts of
probability
Speed
(km/hr.) Range = Max value –Min value = 58.9- 27.8 = 31.1
32.4
27.8
27.8
29.5 Absolute deviation: |value - mean|
43.5
32.4 Ex: Absolute deviation = |52.3- 39.59|=12.71
52.3
32.4 Arranging in increasing order,
36.9
32.4
Quartiles 1st (25% of 2nd (25% of 3rd (25% of 4th (25%
29.5
36.9 data) data) data) of data)
32.4
43.5 Value [1(10)+2]/4th [2(10)+2]/4th [3(10)+2]/4th -
49.8
49.8
58.9 = 32.4 (Q1) =34.65 (Q2) =49.8 (Q3)
52.3
32.4 Interquartile range = Q3-Q1 = 17.4
58.9
Descriptive statistics - Measures of spread
Speed Variance :
(km/hr.)
n
xi x 2
32.4
(32.4 – 39.59)2
27.8
s2 = 684.21/9 = 76.02
(27.8 – 39.59)2
i 1 n 1
43.5
(43.5 – 39.59)2
52.3
(52.3 – 39.59)2
36.9
(36.9 – 39.59)2 Standard deviation :
29.5
(29.5 – 39.59)2
n
xi x 2
32.4
(32.4 – 39.59)2
49.8
(49.8 – 39.59)2
s
i 1 n 1
= √684.21/9 = √76.02 = 8.72
58.9
(58.9 – 39.59)2
32.4
(32.4 – 39.59)2
∑ = 684.21
Data:
Groups of information that represents the qualitative or
quantitative attributes of a variable or a set of variables.
Visual/Graphical Representation:
Frequency distributions
Graphs
Box plot
Scatter plot
Stem and leaf
Data representation: Examples
Bar chart
Year No. of Accidents
Total no. of
Fatal Non- Total accidents
fatal
2002 4 7 11 Graphical
representation
2003 7 6 13
2004 4 6 10
Year
2005 4 4 8
2006 2 4 6
2007 3 3 6 No. of
accidents
Year No. of
Accidents
2002 11
2003 13
2004 10
2005 8
2006 6
2007 6 Frequency polygon
Data representation: Examples
Pie Diagram
4 1.1 2.5 2.8 2.9 3.2 3.8 4.2 4.7 5.1 6.2 6.4 6.5 7.5 7.5 7.7 7.8 8.8 9 9.9
5 0.3 1.7 2.3 2.8 2.9 3.3 4.3 4.8 5.1 5.9 6.4 6.9 7.3 8.7
57.3
Stem and Leaf plot
Data representation: Examples
Given speed data (km/hr.),
63.2 49.9 36.9 44.2 54.8 49 42.9 32.4
54.3 37.5 45.1 51.7 47.5 43.8 55.9 48.8
41.1 47.5 52.3 39.2 57.3 36.3 42.8 58.7
52.9 42.5 46.4 53.3 46.5 43.2 56.9 47.7 Speed class No. of
47.8 35.6 50.3 44.7 46.2 38.4 62.4 49.4 vehicles
56.4 55.1 64.8 52.8 30-35 1
35-40 6
Group into different speed class 40-45 8
Speed No. of
class vehicles
30-35 1
No. of vehicles
35-40 6
40-45 8
45-50 12
50-55 8
55-60 6
60-65 3 30 35 40 45 50 55 60 65
Speed (km/hr.)
Histogram
Data representation: Examples
Speed (km/hr.)
Ogive
Q: Number of vehicles with speed less than 50 km/hr. ?
Ans: 31
Data representation: Examples
Paired data set:
(Number of accidents ,Vehicle speed)
Speed No. of
(km/hr.) accidents
22 1
How to relate?
45 3
32 4
75 8
66 10
58 6
Scatter plot
Data representation: Examples
Random
Experiment
x1 x2 x3 x4 x5 Outcomes
Assigns unique
values to the
Discrete
outcomes Random
variable (a
Function)
Continuous
Probability concepts – Random variables
Discrete Random variable Continuous Random variable
Probability mass function, pmf = p(x) Probability density function, pdf = f(x)
b
p(x) = P(X=x)
f(x)dx P(a x b)
0 p(x) 1 a
f(x) 1
p(x) 1
f(x) 1
Cumulative distribution function F(X ≤ x) Examples:
x P(X=x) f(x) = x, 0 ≤ x ≤ √2; 0 otherwise
1 0.13
F(x) = x2/2
2 0.27 F(3) =0.13+0.27+0.25
3 0.25
4 0.15 = 0.65
5 0.20
Probability concepts – Random variables
Discrete Random variable Continuous Random variable
E(X2 ) x i p(x i )
2 E(X2 ) x 2 f(x)
i
t- distribution
F – distribution
Special Random variables and probability
distributions
Discrete Random variables
Bernoulli :
Two possible outcomes for one trial:
‘success’ (X=1) or ‘failure’ (X=0)
P(X 0) 1- p
pmf 0 p 1
P(X 1) p
Mean = ν; Variance = ν
Continuous Random variables
Normal:
( x ) 2
1
pdf e 22
;- x
2
Mean = µ ; Variance = ϭ2
Special Random variables and probability distributions
Continuous Random variables
Normal random variable, z
x
z
When µ=0 and ϭ =1;
x2
1 -z 0 z
pdf e 2
;- x
2
Exponential:
e x if x 0
P( x )
0 if x 0
n = 5 ; p =1/5 2 0.20
3 .05
4 .006
5 .0003
Examples
Ques: For 3 different routes at a particular location, probability of choice is
given by 0.35, 0.40, 0.25 respectively. What is the probability that out of 5
vehicles reaching at the location, one, three and two vehicles will choose the
route 1, 2 and 3 respectively.
By multinomial distribution,
n! x x x
p( x1 , x 2 ,....x k ) p1 1 p 2 2 ....p k k
x1! x 2 !...x k !
5!
p(1, 3, 2) 0.351 0.43 0.252
1! 3! 2!
Ans: 0.014
Examples
On a motorway, the number of vehicles arriving from one direction in
successive 10 sec intervals was counted and is given below. Find out the
probabilities P(0), P(˃ 3), P(3˂ X˂ 6) etc.
P ( x ) e x
x
P(X x ) e -x dx 1- e -x
0
X
1 z / 2 z / 2
n
/2 /2
X
z / 2
z / 2 z / 2
n
z / 2
E X , where ' ' is th level of significance
n
Level of significance: the probability that the computed estimate will lie
outside the indicated range .Here the range is the confidence level,1
Example
While determining the mean speed of veh. on a section of a road, engineer
wants to be able to assert with 95% confidence that the mean speed is off
by 2.5 km/hr. If std. deviation is 8.2 km/hr., how large the sample is?
z 1 0.95
E X /2
n
/ 2 .025 / 2 0.025
1 0.95; 0.05; /2 0.025
1.96 8.2
2.5
n
Sample size, n = 41
Central Limit theorem
Confidence interval (C.I.) for the population mean µ
z z
C.I. X / 2 , X / 2 1 0.95
n n
0.025
/2
/2
0.025
-1.96
z / 2 z/2
1.96
Example:
A random sample of size 100 is taken from a population with std. deviation
5.1, given that the sample mean is 21.6, construct a 95% confidence interval.
z 2 ,n
Standard normal distribution Chi-square distribution with
‘n’ degrees of freedom
n
x 1
1 x
2
e 2
2 2
pdf ,x 0
n
1!
2
Distributions from Normal distribution
z – Random variable with standard 2 ,n – Random variable with Chi-square
normal distribution distribution
z
2 ,n
P( t n t , n )
z
tn
2n n
n
As n becomes large, n2 1 t n z t ,n
t- distribution with ‘n’ degrees of
freedom
Distributions from Normal distribution
For independent chi-square random
variables
n and m
2 2
, 2n
Fn ,m n
2
m
m
0 F ,n ,m
P(Fn ,m F ,n ,m )
F- distribution with degrees
of freedom ‘n’ and ‘m’
1
F1 ,m ,n
F ,n ,m
How to use these sampling distributions to
draw conclusion?
Hypothesis testing
Concerned with two distinct choices:
Null Hypothesis (H0)
Alternate hypothesis (H1)
Test whether to accept or reject H0 using various test statistics.
Two types of errors:
Solution
Distribution statistics in hypothesis testing
Que. No. 2: The mean spot speed of 15 vehicles observed on a
Sunday at a particular roadway was 81.2km/hr. The mean
speeds of all vehicles at this location as per previous records
was 75.5 km/hr. and std. dev. 10.2km/hr. Is there sufficient
evidence to show that the speeds of vehicles on that Sunday was
higher than the average speed? Take level of significance as
5%
Solution
Distribution statistics in hypothesis testing
Ques. No.3: Two samples of speed data are collected are as
follows:
For sample 1, mean speed is 74.3km/hr. and std. dev. is 7km/hr. (n 1=120)
For sample 2, mean speed is 72.5km/hr. and std. dev. is 8km/hr. (n 2=120)
Is there any evidence to prove that the mean speed reduced by
more than 0.5km/hr. when using these samples? Assume level of
significance as 10%.
Solution
Distribution statistics in hypothesis testing
Que.No.4:For a given vehicle speed data sample of size 20, the
standard deviation observed was 12.5km/hr. The data can be
used only if the standard deviation is near to approximately
equal to10km/hr. Check whether the data can be accepted at 5%
level of significance.
Solution
Distribution statistics in hypothesis testing
Que.No.5:It is desired to determine whether there is less
variability in the speed data collected for day 1 than for day2.
If independent random samples are taken for these two days as
below:
For day 1: std. dev.=12km/hr. ;sample size=12
For day 2:std. dev.=10km/hr. ;sample size=14,
test the given hypothesis with a level of significance 5%.
Solution
Distribution statistics – Hypothesis testing
Que.No.6: Every minute vehicle count data was collected for a period of 65
minutes. Determine at 95% confidence level , whether the data follows a
poisson distribution.
No. of Observed
arrival frequency
0 2
1 6
2 7 To test the fit of data to a
3 12 particular distribution,
X z z
1 X 2 1 2
1 2
z
2
1 2
2
1 2 z z
n1 n2 1 2 z z or z z
2 2
One variance
2
H 0 : 2 0
2 2
2
2 0
2
n 1 s 2
2 0
2
2 12
2
2 0
2
2 12 or 2 2
2 2
Summary of test statistics for Hypothesis testing
TEST STATISTICS
H1
Reject H0 if
Hint: µ0 = population mean
ϭ0 = population std. dev.
Two variance
2 2
H 0 : 1 2
s1
2
/ n1 s 2
2
/n2 2
1 2
2
F F ,n1 1,n 2 1
F
s 2
2
/ n s
2 1
2
/n2
2
1 2
2
F F ,n 2 1,n1 1
s 2
l arg e / nL s 2
small / n S 1 2
2 2
F F ,n l arg e 1,n small 1
Underlying distribution
H 0 : Data follows given distribution
2 Oi E i 2 Data does
not follow
i Ei given 2 2
distribution
Thank You