0% found this document useful (0 votes)
56 views8 pages

Statistical Methods

The document discusses the normal distribution and how to determine if a data set is normally distributed. It provides examples to illustrate the key steps: 1) Draw a histogram of the data set and check if it is bell-shaped. If so, the data is normally distributed. 2) Calculate the Pearson coefficient of skewness. If it is between -1 and 1, the data is not significantly skewed and may be normal. The examples analyze data sets on inventory levels and baseball games, drawing histograms and finding the skewness is low, indicating the data is approximately normally distributed in both cases.

Uploaded by

Layla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views8 pages

Statistical Methods

The document discusses the normal distribution and how to determine if a data set is normally distributed. It provides examples to illustrate the key steps: 1) Draw a histogram of the data set and check if it is bell-shaped. If so, the data is normally distributed. 2) Calculate the Pearson coefficient of skewness. If it is between -1 and 1, the data is not significantly skewed and may be normal. The examples analyze data sets on inventory levels and baseball games, drawing histograms and finding the skewness is low, indicating the data is approximately normally distributed in both cases.

Uploaded by

Layla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Mr.

Mohamed El-Sayed El-Dawoody Lecturer of Mathematical Statistics

LECTURE 1

Normal Distribution

Normal Distribution
 The normal distribution is one of the most important continuous probability distributions used
in statistics. Also, it is called "Gaussian Distribution".
 The normal distribution is used to describe (model) the continuous random variables that tend
to be concentrated around an average value. Therefore, the normal curve takes the figure

 Examples on some random variables used with the normal distribution are:
 The cholesterol level of patients.  The body temperatures of people.
 The weights of pregnant women.  The lifetimes of medical devices (systems).
 The normal distribution has the following properties:
(1) The normal curve is a bell-shaped curve.
(2) The total area under the normal curve is equal to 1 or 100%.
(3) The normal curve is unimodal. That is, it has only one mode.
(4) The normal curve is continuous. That is, there are no gabs on the curve.
(5) The normal curve is symmetric. That is, the area on both sides of mean is equal.
(6) The mean, median and mode are equal and located at the center of distribution.
(7) The normal curve never touches x-axis. But, it is very closer and closer to the curve.
(8) The normal distribution depends on the mean  and the variance  2 (parameters).
Skewed Distributions
 When the data values under study are evenly distributed around the mean, then the distribution
of these data values is said to be "symmetric distribution".
 When the majority of the data values fall to the left or right of the mean, then the distribution
of these data values is said to be "skewed distribution".
 There are two basic types of the skewed distributions are:

LECTURE 1 PAGE 1 STAT 2040


Mr. Mohamed El-Sayed El-Dawoody Lecturer of Mathematical Statistics

(1) Positively (right) skewed distribution


 Is a distribution in which the majority of the data values fall to the left of the mean.
 That is, the mean falls to the right of the median, and both fall to the right of the mode.
 The positively (right) skewed distribution takes the following figure:

(2) Negatively (left) skewed distribution


 Is a distribution in which the majority of the data values fall to the right of the mean.
 That is, the mean falls to the left of the median, and both fall to the left of the mode.
 The negatively (left) skewed distribution takes the following figure:

Determining the normality of a data set


 Many statistical methods require that the distribution of data values under study is normally or
approximately normally shaped distribution.
 For this reason, it is important to determine if the data values are normally or approximately
normally distributed or not.
 There are two basic steps to check the normality of a data values are summarized as follows:
(1) Draw a histogram for the data set and check its shape. Such that:
 If it is approximately bell shaped, then the data set are normally distributed.
 If it is not approximately bell shaped, then the data set are not normally distributed.
(2) Check the skewness by using the Pearson coefficient (PC) of skewness which is given by
3(x  median)
PC  .
s
Such that:
 If PC is between –1 and 1, then the data set is not significantly skewed.
 If PC is less than or equal to –1, then the data set is significantly skewed to left.
 If PC is greater than or equal to 1, then the data set is significantly skewed to right.

LECTURE 1 PAGE 2 STAT 2040


Mr. Mohamed El-Sayed El-Dawoody Lecturer of Mathematical Statistics

Example 1

A survey of 18 high-technology firms showed the number of days’ inventory they had on hand.
5 29 34 44 45 63 68 74 74
81 88 91 97 98 113 118 151 155
Determine if the data set is approximately normally distributed or not?
Solution

(1) Draw the histogram of the data set as follows:


 Range (R)    Largest value – Smallest value   155 – 5  150.
 Number of classes (K) = 1  3.322 log(n )  1  3.322 log(18)  5.17  6 classes.
Range 150
 Class width (W)    25.
Number of classes 6
Now, the grouped frequency distribution and the histogram are given as follows:
Class Limits Class Boundaries Frequency
5–29 4.5–29.5 2
30–54 29.5–54.5 3
55–79 54.5–79.5 4
80–104 79.5–104.5 5
105–129 104.5–129.5 2
130–154 129.5–154.5 1
155–179 154.5–179.5 1
Total – 18
(2) Check the skewness of the data set as follows:
 The mean of the data set is given by
x 
x i

5  29  34  ......  151  155 1428
  79.3 days.
n 18 18
 The median of the data set is given by
n  n 
X    X   1 X  9   X 10 
2 2 = 74  81 155
Med = =   77.5 days.
2 2 2 2
 The standard deviation of the data set is given by

S 
x i
2
n x 2


140,646  18(79.3) 2

27, 453.18
 1614.89  40.2 days.
n 1 18  1 17
Then, the Pearson coefficient (PC) of skewness is given by
3(x  median) 3(79.3  77.5) 5.4
PC     0.134  (1, 1).
s 40.2 40.2
Finally, the histogram is approximately bell-shaped and the data set are not significantly skewed.
Then, it can be concluded that the distribution of a data set is approximately normally distributed.

LECTURE 1 PAGE 3 STAT 2040


Mr. Mohamed El-Sayed El-Dawoody Lecturer of Mathematical Statistics

Example 2

The data shown consist of the number of games played each year in a famous Baseball Hall.
132 148 152 135 151 152 159 70 34
162 130 162 163 143 67 112 142
Check for normality of the data set?
Solution

(1) Draw the histogram of the data set as follows:


 Range (R)    Largest value – Smallest value   163 – 34  129.
 Number of classes (K) = 1  3.322 log(n )  1  3.322 log(17)  5.09  6 classes.
Range 129
 Class width (W)    21.5  22.
Number of classes 6
Now, the grouped frequency distribution and the histogram are given as follows:
Class Limits Class Boundaries Frequency ……………….………….………………………
……………….………….………………………
34–55 33.5–55.5 1 ……………….………….………………………
56–77 55.5–77.5 2 ……………….………….………………………
78–99 77.5–99.5 0 ……………….………….………………………
100–121 99.5–121.5 1 ……………….………….………………………
122–143 121.5–143.5 5 ……………….………….………………………
……………….………….………………………
144–165 143.5–165.5 8
……………….………….………………………
Total – 17 ……………….………….………………………

(2) Check the skewness of the data set as follows:


 The mean of the data set is given by
x
x i

132  148  152  ......  112  70 2214
  130.24 games.
n 17 17
 The median of the data set is given by
 n 1   18 
Med  X    X    X (9)  143 games.
 2  2
 The standard deviation of the data set is given by

S 
x i
2
n x 2


311,502  17(130.24)2

23,140.22
 1446.26  38.03 games.
n 1 17  1 16
Then, the Pearson coefficient (PC) of skewness is given by
3(x  median) 3(130.24  143) 38.28
PC     1.01 (1, 1).
s 38.03 38.03
Finally, the histogram is not approximately bell-shaped and the data set are significantly skewed to
the left. Then, it can be concluded that the distribution is not approximately normally distributed.

LECTURE 1 PAGE 4 STAT 2040


Mr. Mohamed El-Sayed El-Dawoody Lecturer of Mathematical Statistics

Sampling Distributions
 Sampling is the process of taking (drawing) all possible samples from a given population.
 Sampling distribution is a distribution consisting of the means of all possible samples of a
specific size taken from a given population and the corresponding frequencies.
 There are two basic types of sampling are:
(1) Sampling with replacement  Is the sampling in which each element of the population
is selected (chosen) more than one time.
(2) Sampling without replacement  Is the sampling in which each element of the given
population is selected (chosen) only one time.
 Sampling error is the difference between the population measure and the corresponding
sample measure.
Remark
 When the sampling error is large, then the sample is not good to represent the population.
 When the sampling error is small, then the sample is good (perfect) to represent the population.
Basic Rules
Suppose that we have a given population of size N with the mean  and variance  2 . If we select
(choose) all possible samples of size n from this population, such that x 1 , x 2 ,...., x k are the means
of all random samples selected from this population, then we have
 The population mean is given by the formula
x 1  x 2  .....  x N  x i
  .
N N
 The population variance is given by the formula
 2

 (x i   )2
.
N
 The mean of the sample means, say X , is given by the formula
X 
x 1  x 2  .....  x k

x i
.
k k
 The variance of the sample means, say  X2 , is given by the formula
 2

 (x i  X ) 2
.
X
k
Where, k is the number of samples taken from the given population.

LECTURE 1 PAGE 5 STAT 2040


Mr. Mohamed El-Sayed El-Dawoody Lecturer of Mathematical Statistics

Remark
 If the sampling is with replacement, then the number of samples k is given by k  N n .
 If the sampling is without replacement, then the number of samples k is given by k  N C n .
 The symbol  X is called "standard deviation of sample means (standard error of the mean).
Example 3

Suppose that a professor gave an 8-point quiz to a small class of four students. The results of the
quiz were 2, 6, 4, and 8. Assume that the four students constitute the population and all possible
samples of size 2 are taken with replacement from this population. Find each of the following:
(a) The population mean? (c) The mean of the sample means?
(b) The population standard deviation? (d) The standard deviation of sample means?
Solution

(a) The population mean is given as



xi 2  6  4  8 20
  5 Points.
N 4 4
(b) The population standard deviation is given as
 2

 (x
 i   )2
(2  5)2  (6  5)2  (4  5)2  (8  5)2
N 4
9  1  1  9 20
   5 (Points)2.
4 4
     5  2.236 Points.
2

(c) Since, all possible samples of size 2 taken with replacement from the population are given as:
Sample Mean Sample Mean
(2, 2) 2 (4, 2) 3
(2, 6) 4 (4, 6) 5
(2, 4) 3 (4, 4) 4
(2, 8) 5 (4, 8) 6
(6, 2) 4 (8, 2) 5
(6, 6) 6 (8, 6) 7
(6, 4) 5 (8, 4) 6
(6, 8) 7 (8, 8) 8
Then, the mean of the sample means is given as
X  
xi 2  4  3  5  ........  6  8 80
   5 Points.
k 16 16

LECTURE 1 PAGE 6 STAT 2040


Mr. Mohamed El-Sayed El-Dawoody Lecturer of Mathematical Statistics

(d) The standard deviation of sample means is given as


 2

 (x i  X ) 2

(2  5) 2  (4  5) 2  (3  5) 2  .....  (8  5) 2
X
k 16
9  1  4  ......  1  9 40 2
   2.5 (Points) .
16 16
  X   X2  2.5  1.581 Points.

Properties of sampling distribution


(1) The mean of the sample means is equal to the population mean. That is, X  .
(2) The standard deviation of the sample means is equal to the population standard deviation divided
by the square root of the sample size. That is,  X   .
n
(3) The shape of the sampling distribution of the sample means taken with replacement from a
population with mean  and standard deviation  will approach to a normal distribution as
the sample size increases without limit (Central Limit Theorem).
Example 4

Use the data given in example 3 to answer the following questions:


(a) Find the sampling distribution of the sample means?
(b) Verify the three properties of the sampling distribution?
Solution

(a) The sampling distribution of the sample means is given as


Sample Means 2 3 4 5 6 7 8 Total
Frequency 1 2 3 4 3 2 1 16
(b) It is clear that:
 X    5.
  X    2.236  1.581. [smaller than  ]
n 2
 The histogram of the sample means appears to be approximately normal as follows:

LECTURE 1 PAGE 7 STAT 2040


Mr. Mohamed El-Sayed El-Dawoody Lecturer of Mathematical Statistics

Remark
If the sampling is conducted without replacement, then we will have only two properties for the
sampling distribution of the sample means are summarized as:
X   &  X   N n
.
n N 1
Example 5

If the sampling is conducted without replacement in example 3, find each of the following:
(a) The mean of the sample means?
(b) The standard deviation of the sample means?
(c) Verify the two properties of the sampling distribution?
Solution

Since, the sampling is without replacement, then the number of samples taken is given by
4!
k  N C n  4C 2   6.
2!(4  2)!
All possible samples of size 2 taken without replacement from the population are given as:
Sample Mean Sample Mean
(2, 6) 4 (6, 4) 5
(2, 4) 3 (6, 8) 7
(2, 8) 5 (4, 8) 6
Now, we have the following:
(a) The mean of the sample means is given as
X  
xi 4  3  5  5  7  6 30
   5 Points.
k 6 6
(b) The standard deviation of the sample means is given as
 2

 (x i  X ) 2

(4  5)2  (3  5)2  (5  5)2  (5  5)2  (7  5)2  (6  5)2
X
k 6
1  4  0  0  4  1 10 2
   1.667 (Points) .
6 6
  X   X2  1.667  1.291 Points.
(c) It is clear that:
 X    5.
 N  n 2.236 4  2
 X    (1.581)(0.816)  1.291.
n N 1 2 4 1
‫تمت بـحمـد اللـه‬

LECTURE 1 PAGE 8 STAT 2040

You might also like