0% found this document useful (0 votes)
77 views

Outliers Z-Score

The document discusses using Z-scores to identify outliers in a dataset that follows a normal distribution. It explains that Z-scores measure the number of standard deviations an observation is from the mean, with values above +/-3 considered outliers. However, outliers can skew the calculation of Z-scores by inflating the mean and standard deviation. The document then introduces an alternative method using interquartile range to calculate inner and outer fences to identify outliers. Values outside the outer fences or between the inner fences are classified as outliers.

Uploaded by

Ana Chikovani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views

Outliers Z-Score

The document discusses using Z-scores to identify outliers in a dataset that follows a normal distribution. It explains that Z-scores measure the number of standard deviations an observation is from the mean, with values above +/-3 considered outliers. However, outliers can skew the calculation of Z-scores by inflating the mean and standard deviation. The document then introduces an alternative method using interquartile range to calculate inner and outer fences to identify outliers. Values outside the outer fences or between the inner fences are classified as outliers.

Uploaded by

Ana Chikovani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 1

Using Z-scores to Detect Outliers

Z-scores can quantify the unusualness of an observation when your data follow the normal distribution. Z-
scores are the number of standard deviations above and below the mean that each value falls. For example, a
Z-score of 2 indicates that an observation is two standard deviations above the average while a Z-score of -2
signifies it is two standard deviations below the mean. A Z-score of zero represents a value that equals the
mean.

The further away an observation’s Z-score is from zero, the more unusual it is. A standard cut-off value for
finding outliers are Z-scores of +/-3 or further from zero. The probability distribution below displays the
distribution of Z-scores in a standard normal distribution. Z-scores beyond +/- 3 are so extreme you can barely
see the shading under the curve.

In a population that follows the normal distribution, Z-score values more extreme than +/- 3 have a probability
of 0.0027 (2 * 0.00135), which is about 1 in 370 observations. However, if your data don’t follow the normal
distribution, this approach might not be accurate.

Also, note that the outlier’s presence throws off the Z-scores because it inflates the mean and standard
deviation as we saw earlier. Notice how all the Z-scores are negative except the outlier’s value. If we
calculated Z-scores without the outlier, they’d be different! Be aware that if your dataset contains outliers, Z-
values are biased such that they appear to be less extreme (i.e., closer to zero).

To calculate the outlier fences, do the following:

1. Take your IQR and multiply it by 1.5 and 3. We’ll use these values to
obtain the inner and outer fences. For our example, the IQR equals
0.222. Consequently, 0.222 * 1.5 = 0.333 and 0.222 * 3 = 0.666.
We’ll use 0.333 and 0.666 in the following steps.
2. Calculate the inner and outer lower fences. Take the Q1 value and subtract the two values from step 1. The
two results are the lower inner and outer outlier fences. For our example, Q1 is 1.714. So, the lower inner
fence = 1.714 – 0.333 = 1.381 and the lower outer fence = 1.714 – 0.666 = 1.048.
3. Calculate the inner and outer upper fences. Take the Q3 value and add the two values from step 1. The two
results are the upper inner and upper outlier fences. For our example, Q3 is 1.936. So, the upper inner fence =
1.936 + 0.333 = 2.269 and the upper outer fence = 1.936 + 0.666 = 2.602.

Using the Outlier Fences with Our Example Dataset

For our example dataset, the values for these fences are 1.048, 1.381, 2.269, and 2.602. Almost all of our data
should fall between the inner fences, which are 1.381 and 2.269. At this point, we look at our data values and
determine whether any qualify as being major or minor outliers. 14 out of the 15 data points fall inside the
inner fences—they are not outliers. The 15th data point falls outside the upper outer fence—it’s a major or
extreme outlier.

The IQR method is helpful because it uses percentiles, which do not depend on a specific distribution.
Additionally, percentiles are relatively robust to the presence of outliers compared to the other quantitative
methods. Values that fall inside the two inner fences are not outliers. Let’s see how this method works using
our example dataset.

You might also like