0% found this document useful (0 votes)
388 views27 pages

Assumption of Normality Dr. Azadeh Asgari

Azadeh Asgari is a PhD candidate in the Department of Languages & Humanities, Faculty of Educational Studies, University Putra Malaysia (UPM), Selangor, Malaysia. She also holds a Master of Science in TESL from Universiti Putra Malaysia (2009). Her research interests include second language attrition/acquisition and L2 writing.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT or read online on Scribd
0% found this document useful (0 votes)
388 views27 pages

Assumption of Normality Dr. Azadeh Asgari

Azadeh Asgari is a PhD candidate in the Department of Languages & Humanities, Faculty of Educational Studies, University Putra Malaysia (UPM), Selangor, Malaysia. She also holds a Master of Science in TESL from Universiti Putra Malaysia (2009). Her research interests include second language attrition/acquisition and L2 writing.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT or read online on Scribd
You are on page 1/ 27

STATISTICS ANALYSIS

ASSUMPTION OF NORMALITY

Dr. Azadeh Asgari

Assumption of Normality


Many of the statistical methods that we will apply require the assumption that a variable or variables are normally distributed. With multivariate statistics, the assumption is that the combination of variables follows a multivariate normal distribution. Since there is not a direct test for multivariate normality, we generally test each variable individually and assume that they are multivariate normal if they are individually normal, though this is not necessarily the case.

Evaluating Normality


There are both graphical and statistical methods for evaluating normality. Graphical methods include the histogram and normality plot. Statistical methods include diagnostic hypothesis tests for normality, and a rule of thumb that says a variable is reasonably close to normal if its skewness and kurtosis have values between 1.0 and +1.0. None of the methods is absolutely definitive.

Transformations


When a variable is not normally distributed, we can create a transformed variable and test it for normality. If the transformed variable is normally distributed, we can substitute it in our analysis. Three common transformations are: the logarithmic transformation, the square root transformation, and the inverse transformation. All of these change the measuring scale on the horizontal axis of a histogram to produce a transformed variable that is mathematically equivalent to the original variable.

When Transformations Do Not Work




When none of the transformations induces normality in a variable, including that variable in the analysis will reduce our effectiveness at identifying statistical relationships, i.e. we lose power. We do have the option of changing the way the information in the variable is represented, e.g. substitute several dichotomous variables for a single metric variable.

Problem 1
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic? Use 0.01 as the level of significance. Based on a diagnostic hypothesis test of normality, total hours spent on the Internet is normally distributed. 1. True 2. True with caution 3. False 4. Incorrect application of a statistic

Computing Explore Descriptive Statistics

To compute the statistics needed for evaluating the normality of a variable, select the Explore command from the Descriptive Statistics menu.

Adding the Variable to be Evaluated

Second, click on right arrow button to move the highlighted variable to the Dependent List.

First, click on the variable to be included in the analysis to highlight it.

Selecting Statistics to be Computed

To select the statistics for the output, click on the Statistics command button.

Including Descriptive Statistics


First, click on the Descriptives checkbox to select it. Clear the other checkboxes.

Second, click on the Continue button to complete the request for statistics.

Selecting Charts For The Output

To select the diagnostic charts for the output, click on the Plots command button.

Including Diagnostic Plots & Statistics


First, click on the None option button on the Boxplots panel since boxplots are not as helpful as other charts in assessing normality.

Finally, click on the Continue button to complete the request.

Second, click on the Normality plots with tests checkbox to include normality plots and the hypothesis tests for normality.

Third, click on the Histogram checkbox to include a histogram in the output. You may want to examine the stem-and-leaf plot as well, though I find it less useful.

Completing the Specifications for the Analysis

Click on the OK button to complete the specifications for the analysis and request SPSS to produce the output.

The Histogram
Histogram
50

40

30

An initial impression of the normality of the distribution can be gained by examining the histogram. In this example, the histogram shows a substantial violation of normality caused by a extremely large value in the distribution.

20

F requency

10

Std. Dev = 15.35 Mean = 10.7 N = 93.00 0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0 90.0 100.0

TOTAL TIME SPENT ON THE INTERNET

The Normality Plot


3

Normal Q-Q Plot of TOTAL TIME SPENT ON THE INTERNE

Expected N ormal

-1

-2

The problem with the normality of this variables distribution is reinforced by the normality plot. If the variable were normally distributed, the 80 100 120 red 60 would fit the green line very closely. dots In this case, the red points in the upper right of the chart indicate the severe skewing caused by the extremely large data values.

-3 -40 -20 0 20 40

Observed Value

The Test of Normality


.giS fd kliW-oripahS 39 citsitatS ytilamroN fo stseT .giS fd citsitatS vonrimS-vorogomloK
a

Problem 1 asks about the results of the test of normality. Since the sample size is larger than 50, we use the Kolmogorov-Smirnov test. If the sample size were 50 or less, we would use the Shapiro-Wilk statistic instead. The null hypothesis for the test of normality states that the actual distribution of the variable is equal to the expected distribution, i.e., the variable is normally distributed. Since the probability associated with the test of normality is < 0.001 is less than or equal to the level of significance (0.01), we reject the null hypothesis and conclude that total hours spent on the Internet is not normally distributed. (Note: we report the probability as <0.001 instead of .000 to be clear that the probability is not really zero.) The answer to problem 1 is false.

000.

606.

000.

39

noitcerroC ecnacifingiS srofeilliL .a

642.

TENRETNI EHT NO TNEPS EMIT LATOT

The Assumption of Normality Script

An SPSS script to produce all of the output that we have produced manually is available on the course web site. After downloading the script, run it to test the assumption of linearity.

Select Run Script from the Utilities menu.

Selecting the Assumption of Normality Script


First, navigate to the folder containing your scripts and highlight the NormalityAssumptionAndTr ansformations.SBS script.

Second, click on the Run button to activate the script.

Specifications for Normality Script

First, move variables from the list of variables in the data set to the Variables to Test list box.

The default output is to do all of the transformations of the variable. To exclude some transformations from the calculations, clear the checkboxes.

Third, click on the OK button to run the script.

000. .giS

The Test of Normality

The script produces the same output that we computed manually, in this example, the tests of normality.
noitcerroC ecnacifingiS srofeilliL .a 39 fd kliW-oripahS 606. citsitatS 000. 39 642. TENRETNI EHT NO TNEPS EMIT LATOT .giS fd citsitatS vonrimS-vorogomloK
a

ytilamroN fo stseT

Problem 2
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic? Based on the rule of thumb for the allowable magnitude of skewness and kurtosis, total hours spent on the Internet is normally distributed. 1. 2. 3. 4. True True with caution False Incorrect application of a statistic

594. 052.

416.51 235.3 002.01 8.101 0.201 2. 1153.51 556.532 005.5 592.8 398.31 dnuoB reppU dnuoB rewoL

8195.1 rorrE .dtS

075.7 137.01 citsitatS

sevitpircseD

To answer problem 2, we look at the values for skewness and kurtosis in the Descriptives table.

Table of Descriptive Statistics

Tof thumb criteria of 1.0. The variable is not normally distributed. he skewness and kurtosis for the variable both exceed the rule The answer to problem 2 if false.
sisotruK ssenwekS egnaR elitrauqretnI egnaR mumixaM muminiM noitaiveD .dtS ecnairaV naideM naeM demmirT %5 naeM rof lavretnI ecnedifnoC %59 TENRETNI EHT NO naeM TNEPS EMIT LATOT

Problem 3
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic? Use 0.01 as the level of significance. Based on a diagnostic hypothesis test of normality, "total hours spent on the Internet" is not normally distributed. A logarithmic transformation of "total hours spent on the Internet" results in a variable that is normally distributed. 1. True 2. True with caution 3. False 4. Incorrect application of a statistic

The Test of Normality


.giS fd kliW-oripahS 39 39 39 citsitatS ytilamroN fo stseT .giS fd citsitatS vonrimS-vorogomloK
a

Problem 3 specifically asks about the results of the test of normality for the logarithmic transformation. Since our sample size is larger than 50, we use the Kolmogorov-Smirnov test.

The null hypothesis for the Kolmogorov-Smirnov test of normality states that the actual distribution of the transformed variable is equal to the expected distribution, i.e., the transformed variable is normally distributed. Since the probability associated with the test of normality (0.200) is greater than the level of significance, we fail to reject the null hypothesis and conclude that the logarithmic transformation of total hours spent on the Internet is normally distributed. The answer to problem 3 is true.

159. 000. 000.

499. 868. 594.

002. * 300. 000.

.ecnacifingis eurt eht fo dnuob rewol a si sihT .*

39 39 39

noitcerroC ecnacifingiS srofeilliL .a

740. 811. 882.

])EMITEN(/1[ EMITEN fo esrevnI ])EMITEN(TRQS[ EMITEN fo tooR erauqS ])EMITEN(01GL[ EMITEN fo mhtiragoL

Other Problems on Assumption of Normality




A problem may ask about the assumption of normality for a nominal level variable. The answer will be An inappropriate application of a statistic since there is no expectation that a nominal variable be normal. A problem may ask about the assumption of normality for an ordinal level variable. If the variable or transformed variable is normal, the correct answer to the question is True with caution since we may be required to defend treating an ordinal variable as metric. Questions will specify a level of significance to use and the statistical evidence upon which you should base your answer.

Steps in Answering Questions about the Assumption of Normality Question 1


The following is a guide to the decision process for answering problems about the normality of a variable:
Is the variable to be evaluated metric? No Incorrect application of a statistic

Yes Does the statistical evidence support normality assumption? No False

Yes

Are any of the metric variables ordinal level?

No

True

Yes

True With Caution

Steps in Answering Questions about the Assumption of Normality Question 2


The following is a guide to the decision process for answering problems about the normality of a transformation:
Is the variable to be evaluated metric?

No

Incorrect application of a statistic

Yes No No

Statistical evidence supports normality?

Statistical evidence for transformation supports normality?

False

Yes No

Either variable ordinal level?

True

Yes

True With Caution

You might also like