Statistical Methods
Statistical Methods
LECTURE 1
Normal Distribution
Normal Distribution
The normal distribution is one of the most important continuous probability distributions used
in statistics. Also, it is called "Gaussian Distribution".
The normal distribution is used to describe (model) the continuous random variables that tend
to be concentrated around an average value. Therefore, the normal curve takes the figure
Examples on some random variables used with the normal distribution are:
The cholesterol level of patients. The body temperatures of people.
The weights of pregnant women. The lifetimes of medical devices (systems).
The normal distribution has the following properties:
(1) The normal curve is a bell-shaped curve.
(2) The total area under the normal curve is equal to 1 or 100%.
(3) The normal curve is unimodal. That is, it has only one mode.
(4) The normal curve is continuous. That is, there are no gabs on the curve.
(5) The normal curve is symmetric. That is, the area on both sides of mean is equal.
(6) The mean, median and mode are equal and located at the center of distribution.
(7) The normal curve never touches x-axis. But, it is very closer and closer to the curve.
(8) The normal distribution depends on the mean and the variance 2 (parameters).
Skewed Distributions
When the data values under study are evenly distributed around the mean, then the distribution
of these data values is said to be "symmetric distribution".
When the majority of the data values fall to the left or right of the mean, then the distribution
of these data values is said to be "skewed distribution".
There are two basic types of the skewed distributions are:
Example 1
A survey of 18 high-technology firms showed the number of days’ inventory they had on hand.
5 29 34 44 45 63 68 74 74
81 88 91 97 98 113 118 151 155
Determine if the data set is approximately normally distributed or not?
Solution
S
x i
2
n x 2
140,646 18(79.3) 2
27, 453.18
1614.89 40.2 days.
n 1 18 1 17
Then, the Pearson coefficient (PC) of skewness is given by
3(x median) 3(79.3 77.5) 5.4
PC 0.134 (1, 1).
s 40.2 40.2
Finally, the histogram is approximately bell-shaped and the data set are not significantly skewed.
Then, it can be concluded that the distribution of a data set is approximately normally distributed.
Example 2
The data shown consist of the number of games played each year in a famous Baseball Hall.
132 148 152 135 151 152 159 70 34
162 130 162 163 143 67 112 142
Check for normality of the data set?
Solution
S
x i
2
n x 2
311,502 17(130.24)2
23,140.22
1446.26 38.03 games.
n 1 17 1 16
Then, the Pearson coefficient (PC) of skewness is given by
3(x median) 3(130.24 143) 38.28
PC 1.01 (1, 1).
s 38.03 38.03
Finally, the histogram is not approximately bell-shaped and the data set are significantly skewed to
the left. Then, it can be concluded that the distribution is not approximately normally distributed.
Sampling Distributions
Sampling is the process of taking (drawing) all possible samples from a given population.
Sampling distribution is a distribution consisting of the means of all possible samples of a
specific size taken from a given population and the corresponding frequencies.
There are two basic types of sampling are:
(1) Sampling with replacement Is the sampling in which each element of the population
is selected (chosen) more than one time.
(2) Sampling without replacement Is the sampling in which each element of the given
population is selected (chosen) only one time.
Sampling error is the difference between the population measure and the corresponding
sample measure.
Remark
When the sampling error is large, then the sample is not good to represent the population.
When the sampling error is small, then the sample is good (perfect) to represent the population.
Basic Rules
Suppose that we have a given population of size N with the mean and variance 2 . If we select
(choose) all possible samples of size n from this population, such that x 1 , x 2 ,...., x k are the means
of all random samples selected from this population, then we have
The population mean is given by the formula
x 1 x 2 ..... x N x i
.
N N
The population variance is given by the formula
2
(x i )2
.
N
The mean of the sample means, say X , is given by the formula
X
x 1 x 2 ..... x k
x i
.
k k
The variance of the sample means, say X2 , is given by the formula
2
(x i X ) 2
.
X
k
Where, k is the number of samples taken from the given population.
Remark
If the sampling is with replacement, then the number of samples k is given by k N n .
If the sampling is without replacement, then the number of samples k is given by k N C n .
The symbol X is called "standard deviation of sample means (standard error of the mean).
Example 3
Suppose that a professor gave an 8-point quiz to a small class of four students. The results of the
quiz were 2, 6, 4, and 8. Assume that the four students constitute the population and all possible
samples of size 2 are taken with replacement from this population. Find each of the following:
(a) The population mean? (c) The mean of the sample means?
(b) The population standard deviation? (d) The standard deviation of sample means?
Solution
(c) Since, all possible samples of size 2 taken with replacement from the population are given as:
Sample Mean Sample Mean
(2, 2) 2 (4, 2) 3
(2, 6) 4 (4, 6) 5
(2, 4) 3 (4, 4) 4
(2, 8) 5 (4, 8) 6
(6, 2) 4 (8, 2) 5
(6, 6) 6 (8, 6) 7
(6, 4) 5 (8, 4) 6
(6, 8) 7 (8, 8) 8
Then, the mean of the sample means is given as
X
xi 2 4 3 5 ........ 6 8 80
5 Points.
k 16 16
Remark
If the sampling is conducted without replacement, then we will have only two properties for the
sampling distribution of the sample means are summarized as:
X & X N n
.
n N 1
Example 5
If the sampling is conducted without replacement in example 3, find each of the following:
(a) The mean of the sample means?
(b) The standard deviation of the sample means?
(c) Verify the two properties of the sampling distribution?
Solution
Since, the sampling is without replacement, then the number of samples taken is given by
4!
k N C n 4C 2 6.
2!(4 2)!
All possible samples of size 2 taken without replacement from the population are given as:
Sample Mean Sample Mean
(2, 6) 4 (6, 4) 5
(2, 4) 3 (6, 8) 7
(2, 8) 5 (4, 8) 6
Now, we have the following:
(a) The mean of the sample means is given as
X
xi 4 3 5 5 7 6 30
5 Points.
k 6 6
(b) The standard deviation of the sample means is given as
2
(x i X ) 2
(4 5)2 (3 5)2 (5 5)2 (5 5)2 (7 5)2 (6 5)2
X
k 6
1 4 0 0 4 1 10 2
1.667 (Points) .
6 6
X X2 1.667 1.291 Points.
(c) It is clear that:
X 5.
N n 2.236 4 2
X (1.581)(0.816) 1.291.
n N 1 2 4 1
تمت بـحمـد اللـه