0% found this document useful (0 votes)
7 views

Adobe Scan 06-Dec-2023 (4)

This document discusses standard scores (z-scores) and the normal probability curve, outlining their characteristics, purposes, and applications. It explains how z-scores standardize raw scores, allowing for meaningful comparisons across different distributions. The normal distribution is highlighted for its prevalence in natural phenomena and its significance in statistical analysis.

Uploaded by

Kaushiki Riya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
7 views

Adobe Scan 06-Dec-2023 (4)

This document discusses standard scores (z-scores) and the normal probability curve, outlining their characteristics, purposes, and applications. It explains how z-scores standardize raw scores, allowing for meaningful comparisons across different distributions. The normal distribution is highlighted for its prevalence in natural phenomena and its significance in statistical analysis.

Uploaded by

Kaushiki Riya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 36
standard Scores and the Normal Probability Curve earning Objectives ‘After completing this chapter you will be able to: Understand the concept of a standard score. Explain the characteristics, purposes and application of zscores. Understand the meaning and importance of the normal distribution. Explain the equation of the normal probability curve (NPC). Describe the properties of the normal curve. Transfer a raw score into a z-score. Identify areas in a normal curve below or above a particular z-score, and also in between two zscores. Measure skewness and kurtosis as divergence from normality. Understand the significance of skewness and kurtosis. Understand the process of normalising a distribution of scores. Explain the relationships that exist among the constants of the NPC. Discuss the common causes of asymmetry. Illustrate the various applications of the NPC. wee eee 7.1_ Introduction In Chapter 6, the binomial distribution was discussed in detail. The symmetrical binomial (1/2+1/2)" was used to illustrate the binomial expansion. Instead of considering (1/2+1/2)", We might consider the more general form (1/2+1/2)". As n increases in size, the distribution will approach a continuous frequency curve. This frequency curve, which is bell shaped, is the normal curve or normal distribution. The frequency distributions of many events in nature are found in practice to be approximated closely by the normal curve, and they are said to be normally distributed. Errors of measurement and errors made in estimating population values from sample values are often assumed to be normally distributed. The frequency distributions 209 210 STATISTICS FOR BEHAVIOURAL AND SOCIAL SCIENCES iologi i its are Observed to al, biological and psychological measurement : appr enamel a Because the frequency ofoccurence of many eventsin ature can ting, ircally to conform fairly closely to the normal curve, this curve asa, dealing with proviere involving these events. The present chapter discusses the js 0d in ability curve (NPC) in detail. Before proceeding, however, with a detailed Aiscusio’! Ino, normal curve it may be helpful tothe readers to consider briefly the meaning, Chara Purposes and various applications of standard scores (or zscores), ig 7.2 _ Standard Scores (z-scores) _ Hitherto we have considered scores or measurements in the form in which they are orig pbtained. Such scores are represented by the symbol Xwith mean Xana standard devin in thei original form are spoken of as raw scores. We have at conti iat i ~ X. These are known as deviation sew res, and hay. lation of snow we divide the deviation about the Mean byte Standard deviation, we obtain what is called a standard score represented SSRENG deviation, we obtain what is called; astandarc by the symbol z Thi. Ths Thus, a standard score is nothing but a z-score. A zscore nates how many standard deviation units the correspondi, mean, Standard scores havea mean of O and a standard deviation o the sb 2 Sample are multiplied by a constant, the standard d nstant. Deviation scores (x = X — isa transformed score that dg ing raw score is above or below th £1. As previously shown, iat leviation is also multiplied by X) have a standard deviations same thing as multiplying by the constant, tion of the scores, thus, obt Because standard scores have zero mean and ‘unit si tandard deviation, they are readily amenable to certain forms of algebrai ‘ fc aic manipulations. Many formulations can be derivel more conveniently using standard scores than using raw scores or deviation scores. To illustrate, the following observa ‘Score and standard-score form, Chapter7 Standard Scores and the Normal Probability Curve 211 te use ofthe standard scores means, in effec, that we are using the standard deviatio spe unit of measurement Inthe above example subject A is L12 standard deviations or standard the don unitsbelow the mean, while subject Eis 157 standard deviation units above the mean. 724 Characteristics of z-scores ‘There are three characteristics of z-scores worthwhile to note. zscores have the same shape as the set of raw scores, Transforming the raw scores inio their corresponding z-scores does not change the shape of the distribution, nor do the Mores change their relative positions. All that is changed are the score values. The resulting cores will take on the shape of the raw scores. ‘second, the muean of the scores always equals zero (X, =0-This follows from the obser- vation that the scores located at the mean of the raw scores will also be at the mean of the zscores. The z-value for raw scores at the mean equals zero. Suppose we have 500 scores of IQ ‘sith X=100 and o=16. For example, the z transformation for a score that is at the mean of the IQ distribution is given by X_ 100-100 _ o i Thus, the mean of the z-distribution equals zero. ‘The last characteristic of importance is that thestandard deviation of pacorealwapsequslsy This follows because a raw score that is one standard deviation above the mean has a Zscore of +1.0: X)_(K+t0)-K o o 7.2.2. Purposes of z-scores The original, unchanged scores that are the direct result of measurement are often called raw scores. A raw score by itself does not necessarily provide much information about its position within a distribution. To make raw scores more meaningful, or more informative, they are often transformed into new values that contain more information. This transformation isone purpose of zscores.In particular, we will transform X values into zscores so that the resulting zscores tell exactly where the original scores are located. The second purpose of the z-score is to standardise an entire distribution. Acommon exam- ple of a standardised distribution is the distribution of 1Q scores. Although there are several different tests for measuring 1Q_all of the tests are standardised so that they have a mean of 100 and a standard deviation of 15. Because all the different tests are standardised, It is possible to understand and compare IQ scores even though they come from different tests. For example, Weal understand that an IQ score of 95 is little below average, no, matter which IQ test was used, Similarly, an IQ of 145 is extremely high, no matter which IQ test was used. In general terms, the process of standardising takes different distributions and makes them equivalent. The advantage of this process is that itis possible to compare distributions even though they may have been quite different before standardisation. In summary, the process of transforming X values into zscores serves two useful purposes: (Each zscore will tell the exact location of the original X score within the distribution. (ii) ‘The zscore will form a standardised distribution that can be directly compared to other distributions that also have been transformed into z-scores. y 212. STATISTICS FOR BEHAVIOURAL AND SOCIAL SCIENCES Each of these purposes is discussed as follows: f the primary purposes, i) zscores and location ina distribution. Oneo r ofa w isto oro the exact location of a score eee a Seana 25core at lishes this goal by transforming each X value into as , r-)s¢ the sign tells whether the score is located above (+) or below (-) the mean and mt number tells the distance between the score and the mean in terms of the n ‘ e standard deviations or standard deviation units. Thus, in a distribution of standard; of 1Qscores with X = 100 and o=15, a score of X = 130 would be transformed into zeigt = XxX _ 130-100 _30 _ 99) pis zvalue of +2.0 indicates that the score Xary o 15 15 ) is located above the mean (+) by a distance of 2 standard deviations (30 points), In toto, a z-score specifies the precise location of each X value (or TaW Score) wit a distribution. The sign of the zscore (or - signifies whether the original ed above the mean (positive) or below the mean (negative). The numerical value of yo zscore specifies the distance from the mean by counting the number of standard dey, ations between X and X. Gi) zscores and the standardised distribution. Itis possible to transform every y score in a distribution into a corresponding z-score. The result of this process is that the entire distribution of X scores is transformed into a distribution of zscores. The new distribution of scores has characteristics or properties (as discussed earlier) th make the z-score transformation a very useful tool. The zscore distribution is called the standardised distribution. A standardised dist. bution is composed of scores that have been transformed to create predetermined values of mean (X or j)) and standard deviation (s or o). A zscore distribution is an example of a standardised distribution with X or =0 and sor o=1. That is, when any distribution (with any mean or standard deviation)is transformed into z-scores, the transformed distribution will always have X=0 and o=1 The advantage of standardising is that it makes it possible to compare different scores or different individuals even though they may come from completely different distributions. ‘The reader should note that the sum of squares of standard scores ( 2) is equal to n-1. We observe that 2? =(X - X} /s*; hence, pe DAA ROCN Dod - DUX-XF n= SKI XY The reader should note here that if s* is detined as DX - X)? /n, then the sum of squares of standard scores is n, and not n-1. 7.2.3 Applications of z-scores — The following are some of the important applications or uses of standard scores (or z-SCOteSt (i) Transforming X scores into z-scores. A particular raw score should be ©" verted or transformed into a standard score (or z-score). The formula for transformi" Taw scores into z-scores is | 2 -qsqujod 02) SUORETAaP PHEPUEIS O43 30 54 _sgp« hq () waa ange pave20t 5 (0ZT=X) 210% 3 yeu) SOVEONPUT anTEA? a4 sm no, Ob, OL_s Sz ‘Oz+= gz" oot-Oat ¥-X vores aydures © joan yo woo} tas sors PUR oOT=¥ 9302130 UoHNAHST Gus eA EL foun aw ay mojad 10 onfisod) Ueous aun ™ 30 +) 310282 2 \L uoH {uoNeD9] astoaud aup souTdads a10987 V "X PUEX Ue ‘Squans ax Bujruno> £q wea at WOH} ADUEISTD au adsax0982 a Joan [eo [noumnu ay] WORNALSTP & Uy 2409s MEF w FO woP woo] ay} SuyuyUTa+9q (1) ‘gpsiszo*=2 830 0X) 21035 tes Burpuodsoxz09 243 9=£+09= (zixsz0)+09= st+X=X woRIM|Os asTO*=Z SOU 31095 MEIN -z=s jo woneraap prepures e pue 09= X Jo weaur B sey HORNAINSIP ayduues y 72 ardare (s2pis yr0q 01 x ppe)sz+X =X (uopenbs ayy asodsuen) sz = XX (s Aq saps troq Adpynun) x ~ X =5Z s =z ¥-x 1X 303 BMUIIO} AHO2S2 au Buratos 4q paurerqo Apoanp st songea Xx Bupndwyod 10j eMUO} UL sa10352 ‘alf) WOH PaUTULZar~ap aq UeD S91025 MeL BY ‘SP4ODS X OF S210IST ‘Bupseaue) “ueaut ayy anoge syrun UOReTAap prepurys: gzistsi=X aL a a s sev. _. Se Vo" oe-sl X-X ops exe os joao a1o052 aya puld'Z1=8 WHIM 09=X Jo UwoUL e sey sa10>s ajdures jo uoRNgEESTP VTE (e1dures 10) —S—= ™) xox? (uonenjod 103) ae “2 IZ eAIny Ayyqeqodd JeWHON a4R PU se:02$ piepuers —gserdeyy 214 STATISTICS FOR BEHAVIOURAL AND SOCIAL SCIENCES (iv) Standardising a distribution. zscores can be used to standardise an en, bution of raw scores. Every X score in a distribution can be transformed intos lst, sponding z-score. The result of this process is that the entire distribution of x &COne. transformed into a distribution of zscores. Because every zscore distribution co's mean of 0 (X =0 or =0) and the standard deviation of 1(s=1 or o=1), the zs the tribution is called a standardised distribution. Such a standardised distribution, a dis. standard meaning, a common reference value and a comparability. Stang distributions are used to make dissimilar distributions comparable. (vy) Making comparisons. Standard scores are frequently used to obtain com, . of oneesaaee obtained by different procedures. When two scores come fomalty ent distributions, it is impossible to make any direct comparison between then, Suppose, a student of psychology class received a score of X=60 in a Psychology tat and a score of X=56 in a statistics test. For which course should he expect the het! grades? Because the scores have come from two different distributions, we cannot make an, direct comparison. Moreover, without additional information, it is even impossible ¢ determine whether the student is above or below the mean in either distribution, Betors > can begin to make comparisons, we must know the values for the mean and standard devi. ation for each distribution. Suppose, the psychology test scores had X=50 and s=10, andthe statistics test scores had X=48 and s=4. With this new information, we could sketch the two distributions, locate the student's score in each distribution and compare the two locations, An alternative procedure is to standardise the two distributions. If the psychology tet score distribution and the statistics test score distribution are both transformed into zscores, then the two distributions will both have X =0 and s=1 In the standardised distributions. we can compare the student's zscore for psychology with his z-score for statistics because the Scores are coming from equivalent, standardised distributions. However, in practice, it is not necessary to transform every raw score (or X score) in a dis tribution to make comparisons between two scores belonging to two separate distributions Here, we need to transform only the two scores in question. In the student’s case, we must find the z-score for his psychology and statistics scores. For psychology, the student's z-score is aX= s For statistics, the student's z-score is z Note that the student's z-score for statistics is +2.0, which means that his test score i standard deviations above the class mean. On the other hand, his zscore for psychology © +10, which means that his test score is 1 standard deviation above the class mean. In Caan relative class standing, the student is doing much better in the statistics class. Note thir ie meaningful to make a direct comparison of the two zscores. A z-score of +2.0 always in a higher position or location than a z-score of +1.0 because all z-scores or values are the standardised distribution with X (or 1)=0 and s(or 9)=1, ww t— Chapter7 Standard Scores and the Normal ProbabilityCurve 215, 13 Normal Probability Curve: The Meaning and Importance of the Normal Distribution the eighteenth century, gamblers were interested in the chances of thing games and they asked mathematicians to help themout.. DeMo develop the ‘mathematical equation for the normal curv F further developed the concept of the cu {perimental errorsin physics and astronomy, and foundtha py astronomers are di tributed normally. To date, the normal curve is referred to as the curve I \i r¢ se gure 7.1 beating various gam- e(1 ‘was the first to the ped curve, the Gaussian curve, or DeMoivre’s curve (see Figure 7.1). ‘ y distribution curve, which is bell shaped, bilaterally symmetrica and uni normal distribution curve, or simply the normal curve. Such a curve results in plotting the frequencies (f) of scores of a continuous measurement variable, bserved in a very large sample, against the respective scores (X). The distribution of an iden- tical shape is obtained if the relative frequencies (//n), obtained by dividing each observed fre- quency by the total frequency (19, are plotted against the respective standard scores (zscores) computed from the raw X scores, because zscores are derived by a linear transformation of ens (2-4S¥-8) However, this distribution is called a normal probability distribu- tion or NPC because its y-ordinate gives the relative frequencies or probabilities, instead of the observed frequencies, of the respective z-scores, and hence, of the corresponding X scores (see Figure 72). It is apparent from Figure 7.2 even upon superficial examination that the measures are concentrated closely around the centre and taper off from this central high point or crest tothe left and right. There are relatively few measures at the ‘low-score’ end of the scale, an increasing number up to a maximum at the middle pesition and a progressive falling-off towards the ‘high-score’ end of the scale. If we divide the area under the curve (the area between the curve and X-axis) by a line drawn perpendicularly through the central high point to the base line, the two parts, thus, formed will be similar in shape and very neatly equal in area. It is clear, therefore, that the figure exhibits almost perfect bilateral symmetry. Figure 7.1 Normal distribution curve y ~ 216 STATISTICS FOR BEHAVIOURAL AND SOCIAL SCIENCES “ez “tz 0 2 2z az # xl or z-scores Figure 7.2_ Normal probability curve (NPC) This bell-shaped figure is called the NPC, or simply the normal curve, and is of great value in mental measurement. An understanding of the characteristics of the frequency distribution represented by the normal curve is essential to the students of experimental psychology an mental measurement. 7.3.1 Importance of the Normal Distribution The normal curve is a very important distribution in the behavioural sciences—in psycho! OBy, education, sociology, anthropology and so on. Its importance will be clear from the following points. () The normal distribution is a continuous distribution that has long occupied a central place in the theory of statistics. It plays a very important and pivotal role in statistical theory and practice, particularly in the area of statistical inference and statistical qual ity control. (ii) Its importance is also due to the fact that in practice, the experimental results, very often seem to follow the normal distribution or the bell-shaped curve. (iii) Many statistical data concerning business and economic problems are displayed in the form of a normal distribution. In fact, the normal distribution is the cornerstone of modern statistics. (iv) The normal curve is important not primarily because scores are assumed to be nor mally distributed, but because the sampling distributions of various statistics at known or assumed to be normal. Hence, the normal curve’s importance is primatly in sampling statistics. (v) Many of the variables measured in behavioural science research have distribution’ that quite closely approximate the normal curve. Height, weight, intelligence a! achievement are a few examples. / (vi) Many of the inference tests used in analysing experiments have sampling dist tions that become normally distributed with increasing sample size. The sign test Mann-Whitney U test are two such tests, which we shall discuss later in the text. ee Chapter7 Standard Scores and the Normal Probability Curve 217 vii). Many inference tests require sampling distributions tl istril ‘ The z-test, ane ttest, and the est are examples of iefeence that depend on this point. Thus, much of the importanc i tion with inferential statistics, snc® Of the normal curve occursin conjunc: viii) Normal distribution occupies a prominent place in statistics beca properties a oe it applicable toa great many situations in which it's ae to make inferences by taking samples, Thus, i i 7 voangdstbution, ing samples. Thus, the normal distribution is a useful sam- (ix). The normal distribution comes close to fitting the actual observed frequency distr butions of many phenomena, including human characteristics. (weights, heights, IQs), outputs from physical processes (dimensions and yields) and other measures of inter, est to managers in both the public and private sectors. (The normal distribution has the remarkable pr stated in the central limit t rem (Lindheig and Levy, 1925), According to ‘this theorem asthe sample size n cee the distribution of mean X of a random sample taken from practically any population approaches a normal distribution (with mean y and standard deviation o}. Thus, if sam- ples of large size, n, are drawn from a population that is not normally distributed, never- theless, the successive sample means will form themselves a distribution that is approximately normal. As the size of the sample is increased, the sample means will tend to be normally distributed. The central limit theorem gives the normal distribu- tion its central place in the theory of sampling, since many important problems can be solved by this single pattern of sampling variability. As a result, the work on statistical inferences is made simpler. This characteristic makes it possible to determine the mini- mum and maximum limits within which the population values lie. For example, within a range of population means +3o (or 11+32) 99.73% or almost all the items are covered. (xi) The normal distribution has numerous mathematical properties which make it pop- ular and comparatively easy to manipulate. For example, the moments of the normal distribution are expressed in a simple form. (xii) The frequency distributions of many events in nature are found in practice to be approximated closely by the normal curve, and they are said to be normally distrib- uted. Errors of measurement and errors made in estimating population values from sample values are often assumed to be normally distributed. The frequency distribu- tions of many physical, biological and psychological measurements are observed to approximate the normal form. Because the frequency of occurrence (which is called the probability) of many events in nature can be shown empirically to conform fairly closely to the normal curve, this curve can be used as a model in dealing with prob- lems involving these events. To date, the normal probability model is one of the most important probability models in statistical analysis. 7.3.2 Equation of the Normal Probability Curve Intossing ncoins, the frequency distribution of heads or tails s approximated moreclosely by the normal distribution as 1 increases in size. The normal curve is the limiting form of the symmetrical binomial. The NPC is a theoretical distribution of population scores. It is a bilat- erally symmetrical, unimodal and bellshaped curve that is described by the following equa- tion. Thus, the equation for the normal curve is Nye ahi! ov2n 218 STATISTICS FOR BEHAVIOURAL AND SOCIAL SCIENCES. where y=height of the curve for particular values of X Ns total frequency of the distribution or the total area under the curve andard deviation of the distribution = base of Napierian logarithms=a constant of 2.7183 X=any score in the distribution H=mean of the distribution. Note that we have used the notations and o in this above formula to Tepresent mean and standard deviation, instead of X and s because the formula is a theoretical my°® Presumably, wand o may be regarded as population parameters. If N,jzand o are known fa different values of X may be substituted in the equation and the Corresponding values of obtained. If paired values of X and are plotted graphically, they will form a normal gc with mean u, standard deviation o and area N. e The normal curve is usually written in a standard score form, Standard scores hav, ea of and standard deviation of 1. Thus, =Oand o=1, The area under the curveis taken as un that is, N=1. With these substitutions, we may write the equation as follows: y, 1_ en Yap Here, 2is a standard score on X and is equal to (X~p0/o. The score zis a deviation in standard deviation units measured along the base line of the curve from the mean of 0, deviations © the right of the mean being positive and those to the left being negative. The curve has une area and unit standard deviation. By substituting different values of z in the above formula, different values of y may be calculated. When z=0, fact that e°=1, Any term raised to the 0 power is equal to L Thus, the height of the ordinate()) at the mean of the normal curve in a standard-score form is given by the number 03989, ke the height of the curve may be calculated for ; the student is not required to substitute different values of zin the oy KE = 0.3989. This follows from the be observed by inspection of Figure 73. The curve is symmetrical It is asymptotic at the extremities; that is, it approaches but neve reaches the horizontal axis. It can be said to extend from minus infinity to plus infinity. Te area under the curve is finite. 7.3.3 Properties of the Normal Curve For reference in all cases, the unit normal curve is the standard form. It is computed taking ee sample size (29, the standard deviation («) and the length of the class intervals i) of thet bution as 1.0 each. The following are some of the important properties or characteristics O! normal curve: 7, Chapter? Standard Scores and the Normal Probability Curve 219, yosnee 92420 0.0540 0. 0.0044 v ee uw 0 we & xlg oF z-scores 0.9540, Figure 73 Normal curve showing the height of the ordinate at different values of x/c or z @ Gi) i) (iv) Ww) wi) wii) (viii) (ix) (x) (xi) (xii) The unit NPC is bilaterally symmetrical. The mean, median and mode-the three measures of central tendency coincide; that is, they fall on the same point at the middle of the curve. ‘The maximum ordinate of the curve occurs at the mean, that is, where z=0, and in the unit normal curve is equal to 0.3989. _ ~——— ‘The curve is asymptotic. It approaches but does not meet the horizontal axis, and extends from minus infinity (-©) to plus infinity (+), ‘The points of inflection of the curve occur at points plus or minus one standard devi- ation (ie, +10) unit above and below the mean. Thus, the curve changes from convex toconcave in relation to the horizontal axis at these points. The area of the unit normal curve is 1 (N=1), standard deviation is (=1), variance is 1 (e=1)and mean is 0(u=0). . ~ a The mean lies in the middle of the curve and divides the curve into two equal halves. The total area of the NPC is within the z3o, below and above the mean. The curve has two tails or ends—right-hand tail or high end and left-hand tail orlow end. ‘The ax/0) or standard score right to the mean is positive and left is negative. Roughly 68% of the area of the curve falls within the limits plus or minus one stand- ard deviation (#1o) unit from the mean. In the unit normal curve, the limits 2-41.96 include 95% and the limits 2==2.58 include 99% of the total area of the curve, 5% and 1% of the area, respectively, falling beyond these limits. The normal distribution gives the probable distribution of scores of a continuous measurement variable according to the laws of probability. It is, thus, a continuous probability distribution. The normal distribution is bilaterally symmetrical and free from skewness—its coeffi- cient of skewness amounts to zero. . ‘The normal distribution is taken asa standard for the degree of peakedness or kurtosis, It is mesokurtic with its percentile coefficient of kurtosis amounting to 0.263 and its moment coefficient being zero. — ES 220 _ STATISTICS FOR BEHAVIOURAL AND SOCIAL SCIENC! (xiv) The fractional area of the bilaterally symmetrical ea oe CAP btn . ven zscores is identical in both halves . Thus, ion ay Paneer the score of Us y+ lohand +2258 identical with hg pe, the scores of ~ 1 (ie, 10) and -2 (ie, 1-20). tn both the halves of ey Y) The f ordinates at a particular z-score in I bila (xv) Sane alana curve are same. For example, the height of an, Ordinate equals to the height of an ordinate at -1z 7.3.4 Areas under the Normal Curve i Irposes, it is necessary to know or ascertain the proportion wien com cane between ordinates at different points on the base line (or may wish to know three things: (a) the proportion of the area under the curve een ap, ordinate at the mean and an ordinate at any specified point either above or below the m {b) the proportion of the total area above or below an ordinate at any Point on the basin and (©) the proportion of the area falling between ordinates at any two points on the basej Table A of the appendix shows the proportion of the area between the mean of the uni, normal curve and ordinates extending from z=0 to z=3, Let us suppose we want to, find ou, the area under the curve between the ordinates at z=0 and 2=+1.0. We note from Table A that this area is 03413 of the total. Thus, approximately 34% of the total area falle between the fuean and I standard deviation unit above the mean, The proportion of the area of the cure between 2-0 and 2=2 is 0.4772. Thus, about 47.7% of the area of the curve falls between mean and 2 standard deviation units above the mean. The proportion of the area between 2=0 and 2=3 is 0.49865 or little less than 49.9%, Since the unit normal curve is bilaterally symmetrical, the Proportion of the area fa ing between 2=0 and z=-1is aso 03413; between 2-0 and 7=-2/« 0.4772, and between 2=0 and 2-3 is 0.49865. Therefore, the proportion of the area falling between the limits z=41 is 03413-03413 =0.6826, or roughly 68%, The proportion of the area falling between the limits 2522 is 04772+04772=0.9544 or about 95%. The Proportion of the area between the limits 2= 23 is 0.49865 +0.49865~09973 or 997306, The area outside these latter limits is very small ands only 027%, For rough practical purposes, the curve is sometime, taken as extending from 2=#3 (see Figure7.4), of the axis x10 0F z-scores Figure 7.4 Normal curve showing areas between ordinates at different values of fo 02 y Chapter7 Standard Scores andthe Normal Probability Curve 221 0.1587 3 2 a 0 " 42 8 x/o or z-scores Figure 7.5 Normal curve showing area below and above a point of z=1 Let us now consider the determination of the proportion of the total area above or below any point on the base line of the curve. For illustration, let the point be 2=1, We know that the proportion of area between the mean and z=1 is 0.3413. The proportion of the area below themean or above the mean is 0.5000, because the entire area of the curve is 1,and the mean tivides the area of the curve into two equal halves. Therefore, the proportion of the total area below z=1 is 0:5000+ 03413-08413. Thus, the proportion of the total area above this point Ge, above 2=1) is .0000 -08413=0.1587 (or 0,5000-0.3413=0.1587). Similarly, the proportion bf thearea above or below any point on the base line can be readily determined or ascertained (see Figure 75). ‘Let us consider the problem of finding the area between ordinates at any two points on the base line. Let us assume that we require the area between z=0.5 and 2-15, From Table A of the appendix, we note that the proportion of the area between the mean and 2=0.5 is 0191S. Wealso note that the proportion of the area between the mean and z=15 is 0.4332. Therefore, the area between z=05 and 2=15 is obtained by subtracting the lower area from the larger area, and thus it is 0.4332 -0.1915= 0.2417. The area for any other segment of the curve may be similarly obtained (see Figure 7.6). ‘On certain occasions we wish to find the values of z which include some specified propor- tion of the total area. For example, the values of z above and below the mean, which include a proportion 0.95 of the area, may be required. We select a value of zabove the mean which includes a proportion 0.475 (095/2=0.475) of the total area and a value of z below the mean which also includes a proportion 0.475 of the total area. From Table A of the appendix, we observe that the proportion 0.475 of the area falls between 2=0 and 2=196. Since the curve issymmetrical, the proportion 0.475 of the area falls between 2-0 and 2=-196. Thus, a pro- portion 095 or 95% of the total area falls within the limits 2=+1.96. Also, proportion 0.05 or 5% falls outside these limits, Similarly, it may be shown that 99% of the area of the curve falls within and 1% outside, the limits 2-42.58. Figure 77 Is a normal curve showing values of 2 which include a proportion 0.95 of the total area. 735 Areas under the Normal Curve—Illustrative Example The distribution of IQ scores obtained by the application of particular test is approximately normal with a mean of 100 and standard deviation of 15. We are required to estimate what Percentage of individuals in the population have [Qs of 120 and above. First of all, we will find ut the z transformation of the IQ score of 120. The IQ of 120 in standard score form is 2=(120-100)/15=133, Thus, an IQ of 120 is 1.33 standard deviation units above the mean. a NCES 222 STATISTICS FOR BEHAVIOURAL AND SOCIAL SCIE! , he proportion of the area above a st to Table A in the appendix shows that t! ear a soars 1s 0.092. Thus, about 9.2% of the population have IQs eq Sreaterthan ht : - : in Meareaso required to estimate for thesame test ST eee ‘sich iMeludg 50% of the population. The middle 50% includes 25% below the mean an 11.25% above mean. Table A given in the appendix shows that 25% or a proportion 0.25 ofthe atest the curve below the mean falls between the mean and a standard score of 0675. isons arg oarton O25 ccd toca ee and a z-score of +0.675. Thus, the mi foanaaeaaaee i = . The standard score scale has a mean of rd dev} otLHere, see nase ranaform standard scores to the original score a Of1Qs with a mean of 100 and a standard deviation of 15. To transform standard eae Qs, we multipy standard score by 15 and add 100. That means, X=zo+y The IQ score below the mean ¢ (-0675)05)+100=100 - 10.125 =89875=90. The IQ score above the mean is (+0.675)15). 499 =10.125+100=110.125=110. Thus, we estimate that about 50% of the populations have | within a range of roughly 90-110. Therefore, from a normal curve we can find out the area corresponding to any raw score and also find out the raw score corresponding to a giver area. aneen ‘ation 7.4 Measuring Divergence from Normality A normal distribution is described as an exactly perfect bell-shaped curve. However, such perfect symmetrical curve rarely exists in our actual dealings as we usually cannot measurean entire population. Instead, we work on representative samples of the population. Therefore, in actual practice, the slightly deviated or distorted bell-shaped curve is also accepted as the normal curve on the: assumption of normal distribution of the characteristics measured in the entire population. From the above | discussion, it should not be assumed that the distribution of the data inall cases will always lead to normal or approximately normal curves. In the cases where the scores of the individuals in the group seriously deviate from the average, the curves representing these distributions also deviate from the shape of a normal curve. This deviation or dives gence from normality tends to vary in two ways. in terms of skewness and in terms of kurtosis. The skewness and kurtosis are known as the two types of errors of the normal distribution, which have been discussed in the following. ~~ - 7.4.1 Skewness — In some frequency distributions, scores are concentrated at ‘one end of the scale and are much fewer towards the other end-Such an asymmetric distribution has its peak or mode towards the former end and a longer and more pointed tail at the other end. Such a distribution is called a skewed distribution. Skewnessis the the distribution. 7.4.1.1 Properties of Skewed Distributions The following are some of the crucial characteristics or properties of the skewed distributions (a) A skewed distribution cannot be bisected into tw ‘0 symmetrical halves, because on of its tails is longer and more tapering than the other. Standard Scores and the Normal Probability Curve 223, 27-20 005 4115042 43 x/g or z-scores Figure 76 Normal curve showing area between ordinates of z=0.5 and z=15 xlo or z-scores Figure77 Normal curve showing value of z which includes a proportion 095 of the total area () Skewness may be positive or negative accordingly as the pointed longer tail rolls downto the high-value (right) end or the low-value (left) end of the scale, respectively. The scores are more concentrated towards the respective opposite ends of the scale. In other words, a distribution is said to be skewed when the three measures of cen- tral tendency fall at different points in the distribution, and the balance (or centre of ‘gravity) is shifted to one side or the other-to left or right. A distribution is said to be positively skewed or skewed to the right when scores are massed at the low (or left) end of the scale and are spread out gradually towards the high or right end as shown in Figure 78. A distribution is said to be sregatively skewed or skewed to the left when scores are massed at the high end of the scale (the right end) and are spread out more gradually towards the low end (or left) as shown in Figure 79. (©) The mean, median and mode fail to coincide in an asymmetric distribution. Both median and mean are displaced from the mode towards the skewed tail, but the displacement of the mean considerably exceeds that of the median. So, Mean> Median > Mode in positively skewed distributions, while Mode>Median>Mean in —~—__ 224 STATISTICS FOR BEHAVIOURAL AND SOCIAL SCIENCES, negatively skewed ones In other words, the mean is pulled more towards the oa the distribution than that of the median. In fact, the greater the gap bets mean and median, the greater the skewness. Moreover, when skewness jg negative the mean lies to the left of the median; and when skewness is positive, the mean le to the right of the median. In both the cases of skewness, either positively skewed ies Negatively skewed, the median lies in between the mean and mode. Because dey ation of mean exceeds that of median, the latter (ie, median) is more dependable the two as a measure of central value in a skewed distribution. on @)_Unlike symmetrical distributions where all odd-order central moments value, m, and higher odd-order central moments have positive or negat skewed distributions, their signs and magnitudes indicating, Possess a tions and degree of skewness. tive values respectively, the dies (©) Unlike symmetrical distributions where the first and third quartiles (Q, and Qan equidistant from the second quartile (Q, or median), Q, is displaced towards : ‘skewed tail in an asymmetric distribution. Therefore, (Q,-Q,)>(Q- Q)in Positively skewed distributions and (Q,- Q,)<(Q,~Q,) in case of negative skewness, 7.41.2 Measures of Skewness The foll e lowing measures of skewness are expressed in pure numbers, free variable. from the unit of the (@) Pearson’s first coefficient of skewness: ___Mean-Mode _ Standarddeviation — X-Mo —: AEN Mode Median Mean Figure 7.8 Positive skewness (the curve inclines more to the right) Mean Median Mode Figure 7.9 Negative skewness (the curve inclines more to the left) Chapter7 Standard Scores andthe Normal Probability Curve 225, o Pearson’s second coefficient of skewness: 3(X-Mdn) The second coefficient is preferable to the first because of the difficulty in estimating the mode of a distribution precisely. In symmetric distributions like normal distribution, X = Mdn = Mo; so both the coefficients of Pearson amount to zero in such distributions. Bowley’s quartile coefficients of skewness: gk —(Q=QW)~(Q -Q) _(Q-Q)-(Q-Q) Q-Q) 2Q ' © where Qis the quartile deviation. In symmetric distributions like normal distribution, Q,- Q.= Q,~ Q,; so the quar- tile coefficient amounts to zero in such distributions. (@ Moment coefficient of skewness (y, or g,): n= % , where the third moment m, about the mean is divided by the third power of the standard deviation (s). The other formula reads as follows: be, m, jm, where the third moment m, about the mean is divided by the quantity m,/m,, the second moment m, about the mean. Because the m, amounts to 0 for a symmetric distribution, 7, or g, has a zero value for symmetric distributions like a normal distribution. (©) Percentile coefficient of skewness: sk = Pin* Po) _p. 2 Forall the five coefficients of skewness, a value of 0 indicates a symmetrical distribution without skewness. A positive or negative value indicates, respectively, a positive or negative skewness, The degree of skewness is given by the magnitude of the coefficient. Example 7.4 Compute the Pearson's second coefficient of skewness for a frequency distribution of alpha scores having a mean of 1708, standard deviation of 12.63 and median of 1720, Solution X=1708, s=12.63, Mdn=1720. Oe 3(X-Mdn) _ 3(1708-172.0) Ts 12.63 = 312) _ =36 “7263 12.63 226 STATISTICS FOR BEHAVIOURAL AND SOCIAL SCIENCES This slight negative skewness shows that the distribution is slightly negatively skeweg and the distribution of data approaches the normal form. Example 7.5 Calculate the percentile coefficient of skewness for a frequency distripy, tion of differential aptitude test scores of 50 students, having Pio=152.0, Py=1729 ang Pay=187.0. Solution P=152.0; Po=172.0; Py=187.0. Sk= (Py fe ae “stores -172.0 se -172.0 = 1695 -172.0=-25. This obtained skewness of -2.5 shows that the distribution of data is a negatively skeweq curve. Example 7.6 Compute the quartile coefficient of skewness for the frequency distribution of human body weights, having Q,, Q and Q, respectively, 60.4, 633 and 658k; Solution Q,=60.4kg; Q.=633kg; Q,-658kg. Q~ (Q,-Q) $k == )-Q@ -Q) _ Q-Q) =25=29 04 __yyz4, Sa, sa orQ-B=% 958-604 54 _ 27 2 2 sk =(Q=Q)=(Q,-Q) _(658-633)~(683- 604) 2Q 2x27 ivarrs Slight negative skewness shows that the distribution of body weights is lightly nega tively skewed and it closely approaches the normal form. Example 7.7 Find out the moment coefficient of skewness for a distribution of test scores having m,=1080, m,=9.20 and s=3.03, Solution m,=1080; m,=9.20; s=3.03. sk=y, = - 1080 _ 1080 © (G03) ~ 27.82 ~*0388. m, 1080 1080 10.80 orSk=g,= = . = 1080 _ og 4 mlm, 9201920 ~ 920%308 ~ 27.88 ic The difference between y, and g; is quite negligible, that is (0.388 -0.387), 0.001, which might be due to the rounding up of the decimal points. However, the distribution of scores a positively skewed curve. Chapter7 Standard Scores and the Normal Probability Curve 227 jple 7:8 Compute the moment coefficient of skewness for the following set of scores:6, §,10,16 and 20. solution | Scores (X) (X-X) (x-XP (x-xXP | 6 6 %6 -216 8 4 16 64 | 10 2 4 38 | 16 | 4 | 136. | Oe, sp (EGR [BB ara est = im DERE 6 ayn : _xp my LICR? 28 7g Moment coefficient of skewness (y, oF gi): ay, =! =576 _ S76 9.407 Sken= "5" Gay aaa or 576 572, S76 ye ely = =—— =0-¢ hi, Gaye Waxed ia The amounts of skewness obtained by 1, and g, are almost similar; the difference between these two is very negligible, which may be due to the rounding up of the: decimals. However, itis positively skewed set of numbers. 74.2 Kurtosis The term ‘kurtosis’ refers to the flatness or eakesiness of frequency distribution as compared with the normal distribution. Kurtosis is usually of three types: mesokurtic, leptokurtic and latykurtic. The normal distribution is said to be mesokurtic and its peakedness of a medium 228 STATISTICS FOR BEHAVIOURAL AND SOCIAL SCIENCES \_ Leptolurte Pr a xe or z-scores: Figure 7.10 Different forms of kurtosis-leptokurtic, mesokurtic and platykurtic curvey order is taken as the standard (see Figure 7.10). A frequency distribution more peaked than, the normal is said to be leptokurtic the one flatter than the normals called platykurtic. A leptokurtic distribution has a higher peak, thicker tails and a narrower body than th normal distribution. Thus, leptokurtic distribution has higher frequencies of scores nate Centre and at its two tails than that of a normal distribution with the same mean and Variance, but has lower frequencies of scores of intermediate magnitudes. A platykurtic distribution is flatter at its centre, broader in the body and thinner ‘at the tails than that of the normal distribution because compared to the latter, the former carries lower frequencies of scores near its centre and its tails, but higher frequencies of scores of intermedi- ate magnitudes. 7.4.2.1 Measures of Kurtosis (a) Percentile coefficient of kurtosis (Ku): Ps Pa = fats orku= : 2Py— Pg) RP) leptokurtic distributions, kurtosis amount is Tespectively. In other words, the amount of kurtoss for a normal curve is 0. e value of kurtosis is greater than 0.263, the distribu tion is said to be platykurtic, and if the value is less than 0.263, the distribution i called leptokurtic. (b) Moment coefficient of kurtosis (7 or ,): Ku where sis the standard deviation (SD), and m, and ‘muare, respectively, the second an fourth central moments about the mean. For mesokuttic, platykurtic and leptokut distributions, 7, or g. amounts to zero, a negative value and a positive value, rep tively. The number 3 comes about because the ratio m, /m? ~3 for a normal cistibe tion. This means that g,-0 for a normal distribution. Fora leptokurtic distribution & is greater than zero, and for a playkurtic distribution, gis les than zero. ile coefficient of ki distributi pute the percentil of kurtosis for a frequency distribution of TS tsar having a 10th percentile of 152.0, a 25th percentile of 162.62, a 75th Beam of 17919 anda 90th percentile of 187.0. pen 21520; Px 16262; Pa=17919; Poy=1870, 50° se gan PazPx = 916200 1657 1657 =Ku=50p—P,) 20870-1520) 2238 70-023” eye 20 Poe 7918. a -Q, _ 17919-1621 = Q5Q, _ WSI9-16262 105) 595 2 cue 2 = 82858285 999, Py-Py 1870-1520 35 since the’ obtained Ku is less than 0.263, the distribution is slightly leptokurtic. pxample 7,10 Calculate the percentile coefficient of kurtosis for a frequency distribution of tential apeitude test Scores, having P>=80.60, P,,=88.89, P,;=108.15 and P,)=116.03. P)=80:60, Pys=8889, P,s=108.15 and P,y=116.03. 108.15 - 88.89 1926 _ 1926 ~ 2(116.03-80.60) 2x3543 7086 solution 272. since the obtained Ku exceeds 0.263, the distribution is slightly platykurtic. Example 7.11 Calculate the moment coefficient of kurtosis (g,) for a distribution of test scores whose second and fourth central moments about the mean are 8.0 and 108.80, respectively. Solution m,=8.0; m,=108.80. m, 108.80 2§,=—b-3= 108.80 My _3 - WOBBO 3 _ BS) _3-17-3=-13. m,: (80F 64.00 a Since the obtained g, is negative, the distribution is platykurtic. Fample 7.12 Compute the moment coefficient of kurtosis for the following set of scores: 6, ,10,land 14. Solution Scores (4) (x-X) (x-XP (x-xy x-LX 0199 n 5 —— sp= |X =%? - (8 eB =201 n LiX- XP _ 34 7 = 34.68 Le n 5 (X-X)* 434 _ m= aasa ‘ n Moment coefficient of kurtosis (yz OF): 868. 868, 868 = My 3-568 3 _ 868 3 1 98-3--112, Be 3 Garo 462d = M3. 8683 _ 868 3 ig7 3 a3, Pee Osi eeucao The amounts of kurt ‘osis obtained by , and g, are almost similar; tl these two is very much negligible, which may be due to the roundi However, it isa platyki the difference Detween ‘urtic set of numbers, because the amount of kt Ing up of the decimal, ‘urtosis is negative. 7.5 _Normalising a Distribution of Scores (a) The observed scores are arranged in a continuous frequency distribution with cas intervals of equal length (i), The midpoint (X,) of each class interval as well as X and sof the sample is computed (©) Each X. is transformed into a z-score by the formula that reads as follows: X.-X z=4, s The unit normal-curve table (Table A in the appendix) is used to find the height of the ordinate (y) of the unit normal curve at each computed recone ‘The expected frequency of each class interval Of the best-fitting normal distributio" is computed in the following way (see Table 7.1 te-(2)y, te Chapter7 Standard Scores and the Normal Probability Curve 231 pole? Nommaling a distribution of scores: Computation of the best-fitting normal @ (2) @) (a) _6 ©) @ Gass intervals fs xX, X,-X z y fe 90-94 1 92 281 230 0.0283 7 95-89 3 87 231 189 | 0.0669 41 ~~ $0-84 8 82 181 148 0.1334 82 75-79 2 7 Bl 107__| 02251 138 70-74 28 72 81 066 | 03209 197 [669 36 7 31 025 | 03867 | 238 7 60-64 2 62. -19 ~016 | 03939 242 55-59 18 7 ~69 | -05s7 | 0331 | 208 50-54 10 32. aus | -097 | 02492 153 45-49 8 a7 -169 138 | 01539 9s 40-44 8 42 -219 -179 | 0.0804 49 35:39 8 37 -269 | -220 | 00255 22 [3034 1 32 -319 | -261 | oo132 08 © Each fe may be graphically plotted against the X. of the corresponding class interval for drawing the best fitting normal curve. Example 7.13 A psychological test yields a distribution of scores of 150 students as follows with a mean of 63.9 and a standard deviation of 12.2. Compute the expected frequencies of the best-fitting normal distribution. | Class 30- | 35- | 40- | 45- | SO- | SS- | 60- Intervals_| 34 | 39 | 44 | 49 | 54 | 59 | 64 Frequencies 8 65- 69 36 70- | 75- | 80- & 8 85- | 90- 89 | 94 311 Solution (@) The data are arranged in the first two columns of Table7.1. Column Lists the intervals and Column 2 shows the observed frequencies (f,). N=150, X=639, s=122, i=5. 232 STATISTICS FOR BEHAVIOURAL AND SOCIAL SCIENCES « a-—Frequency polygon 0 | : lormalised curve 0 27 32 37 42 47 52 57 62 67 72 77 82 87 82 97 Midpoints of class intervals Figure 7.11 Frequency polygon and normalised curve for the data in Table 71 int (X.) of each class interval is computed and recorded in Column 3 (b) The midpoi Table 7.1. - (©) The deviation of each midpoint (X,) from the mean (X) is calculated and Tecorded in Column 4. formed into the zscore by dividing it by the (d) Each deviation score (x= X,- X) is trans! standard deviation (5). For example, for the interval 90-94, X.-X _ 92-639 _ 281 Os ae. ag 2 These z-scores corresponding to each interval are recorded in Column $ of Table71. (©) The height (y) of the ordinate at each computed z-score, neglecting the algebraic sign of the latter, is then recorded from the unit normal-curve table (Table A given in the appendix). For example, for the z-score of 2.30, y=0.0283, These heights of ordinates are recorded in Column 6. (f) The expected frequency (fe) of the best-fitting normal distribution is computed for each z-score by multiplying its yscore with a tatio of in/s, which is a constant for all zscores in the distribution. For example, for the class interval 90-94, 2. So 75: fe= ye! 0.0083 615-17, s 22 The computed expected frequencies (fe) Cortespond to the height of the ordinate f the best-fitting normal distributio- at the z-scores in the respective class intervals of These computed expected frequencies (fe) are recorded in Column 7 of Table7. ) In Figure7.11, theaxes have been set up in the usual manner for constructing a freque) polygon. First the f,'s are plotted and these points connected with a rule asin the method for plotting a frequency polygon. Then the values of fe’ in Colamn 7 ae pit ted and these are connected by means of a smooth Curve. In Figure 7.11, we ie curve of best fit for these data superimposed upon the frequency polygon for the as nal data; the /.’sand fe’s are plotted against the midpoints of the respective class inter Chapter7 Standard Scores and the Normal Probability Curve 233 Relationships among the Constants of the NPC 1. the Qmay be used as the unit of measurem sto oe normal curve.In the NPG, the quattile devi oe or PEL The relationship between the PE and ¢ PE=067450 = 1.4826 PE omwhich itis seen that o is always about 50% larger than the PE. nent in determining areas within given viation (Q) is generally called the proba- is given in the following equations: 11 Applications of the Normal Probability Curve ‘the NPC has wide significance and applications in the field of measurement concerning. pehavioural sciences. Some of its important applications are discussed in the following text with illustrative examples. The solution of the problems given in these examples requires the Tnowledge of the conversion of raw scores into z-scores and vice versa, and the knowledge of how to use the normal-curve table (Table A given in the appendix) for finding out the frac- tional parts of the total area of the curve in relation to sigma (0) distances. 1 Todetermine the percentage of cases in a normal distribution below, above or within given limits. Example 7.14 Let us assume that IQs are normally distributed in the population witha mean’ of 100 and a standard deviation of 15. Find the percentage of people with IQs (a) below 90, (b) above 120 and (c) between 75 and 125. Solution @) X=100, o=15. First of all, the raw score 90 should be transformed into a zscore. z-score equivalent to raw score 90 = ~—* am e--067, +1.330 (120) 41.670 (12s) ote Xo HG 7) BS) CH00) (115) (130) (148) Figure 7.12 showing the percentage of cases below 90, above 120 and between 75 and 125 ne 234 STATISTICS FOR BEHAVIOURAL AND SOCIAL SCIENCES (b) () The score 90 falls at -0.67z below the est Romie haiertertarabhe fable (Table 4 : the append) io etlonofareathatliesbelov -08721802514 (Le, 08000, Parr eae ere lt ronal curve isa bilaterally symmetrical curve,and fact halt bao and above the mean contains ores in 100 that any score inthe distbutns en ean tardiness z-score equivalent to raw score 120 = -—— |. From the normal-cuy Thus, the score 120 falls at 133z above the mean. 1 curve ta (Table A), it is found that the proportion of the total area lies between X and 133n¢ 0.4082. Hence, the proportion of area lies above +1.33z is 0.0918 (ie, 0.5000 -¢, Therefore, 9.18% of people have IQs above 120, and the chances are about 9 % in 100 that any score in the distribution will be larger than 120 (see Figure 7.12). X-=100, o=15. First of all the raw scores 75 and 125 should be transformed into the, respective z-scores. X-X __ 75-100 z-score equivalent to raw score 75 = z-score equivalent to raw score 125 = From the normal-curve table (Table A), it is found that the area lies between the X and -1.67z is 0.4525. bilaterally symmetrical, the proportion of the total area lies between the X and +1677 is also 0.4525. Hence, the proportion of the total area lies between -1.67z and +167zis 0.9050 (i.e, 0.4525+0.4525). Therefore, 90.5% of people have IQs between 75 and 125, and the chances are about 90 in 100 that any score in the distribution will be found between 75 and 125 (see Figure 7.12). proportion of the total ince the normal curve is bell shaped and 2. To find the limits in any normal distribution which include a given percentage of the cases. Example 7.15 Given a normal distributior mean of 100 and a standard deviation n of achievement scores in the population witha of 20, find the score limits that include the (a) lowest 20%, (b) highest 10% and (c) middle 50* Solution (a) % of the cases. X = 100; ¢ = 20. Since 50% of the cases of the normal distribution lie in the right hall or left half of the distribution, the lowest 20% of the cases lie in the left half of the distribution. The lowest 20% of the cases imply that the rest 30% of cases lie betwee? its lower limit and the mean of the distribution. This 30% of the cases means that te Proportion of the total area is 0.30. From the normal-curve table (Table A in th appendix), it is found that the proportion of area 0.30 lies between the X and 08 Therefore, the required score here will be Chapter7 Standard Scores and the Normal ProbabilityCurve 235, Middle 50% (& + 0.6750 = 86 and 113) Point cutting the lowest 20% (0.840 = 83) Point cutting the highest 10% (+1.280 = 126) ~ 20 x | x Hic f #20480 =) | (100) faa) | ra) 180) 0.846 -0.6756 40.6750 41.280 (63) (66) (113) (126) Figure 7.13. Showing the score limits that include lowest 20%, highest 10% and middle 50% of the cases X=X+z0=100+(-0.84x20)=100-16.8=83.2 or 83. Here, we will discuss about the limits in terms of scores which include the lowest 20% of the cases. The upper limit of these cases may now be given by the score point 832 or 83, and the lower limit will be the lowest score of the distribution. In other words, we may say that the lowest 20% of the distribution lies below the score 832 or 83 (see Figure 7.13). (b) X=100;0=20. The highest 10% of the cases lie in the right half of the distribution, and these highest 10% imply that 40% of the cases lie between its lower limit and the mean of the distribution. This 40% of the cases means that the proportion of the total area is 0.40. From the normal-curve table (Table A), we know that 0.3997 proportion of area lies between the mean and 1.28c. Therefore, the lower score limit of the highest 10% of the cases is X= X +20=100+(1.28x20)=100+25.6=1256 or 126. Thus, the lower score limit of the highest 10% of the cases of the distribution is 1256 or 126 and its upper score limit will be the highest score of the distribution. In other words, we may say that the highest 10% of the cases of the distribution lies above the score 125.6 or 126 (see Figure 7.13). (©) X=100;6=20. The middle 50% of the cases in a normal distribution include the 25% just above and the 25% just below the mean. The 25% of the cases means that the proportion of the total area is 0.25. From Table A of the appendix, we find that 025 area of the distribution lies between the mean and 0.67So, and of course, 0.25 area of the distribution also lies between the mean and ~0.675Sc. The middle 50% of the cases (or proportion of the total area 0.50), therefore, lie between the mean and +0.67So. Thus, the score limits that include the middle 50% (or proportion of area 0.SO) of the distribution are ™ 236 STATISTICS FOR BEHAVIOURAL AND SOCIAL SCIENCES, Score point 100 (1.8., +1.280) —< a oe 1280 “Sc 854i Ro Ha 425439 ee ec 4 Figure 7.14 Showing percentile rank of score point 100 X £06750 or X-0.675o and X +0.675e oF 100 - (0.675 x 20) and 100 +. (0.675 x20) or 100-135 and 100 + 13.5 or 86.5 and 113.5 or 86 and 113, Hence, the middle 50% of the cases in the given distribution lie between the scores 86S and 1135 (1e, 86 is the lower limit and 1135 is the upper limit). This is depicted in Figure 7.13. 3. To determine the percentile ranks of individuals. Example 7.16Givena Normal distribution with X=80 and o= 16, find the percentile rank of an individual scoring 100, x tion plus the proportion of area between the | 0.5000 + 0.3944 =0.8944, which means 89.44% 0! individuals whose scores lie below the score point I (00. Therefore, we may say that the perce” tile rank of an individual scoring 100 is 89 (see Figure 7.14), Chapter7 Standard Scores and the Normal Probability Curve 237 30 20 10 Ps, X +o 420480 2) 48) 64) Gy (@) (86) (112) (128) igure 735 _ Showing score point of percentile Py {To determine the percentile points in terms of scores of a given percentage of cases. fxample 7.17 Given a normal distribution with a mean of 80 and standard deviation of 16, determine the percentile Ps, Solution X =80; o=16. in Pyy We have to find oul a score point below which 30% of the cases lic. In the NPC (see Figure 7.15), these 30% of cases fall in the left half. This implies that the rest 20% of cases lie on the left side of the mean. From the table of normal curve (Table A), it is found that 20% of cases or the proportion of area 0.20 lie between the mean and -0,52Sc. Therefore, therequired score is X = X-0525c=80-0525x16=80-8.4=71.6 or 72. Therefore, the score point of Py) is 72. STocompare scores on two different tests. Example 7.18 A student obtains 90 marks in Mathematics and 50 marks in English. If the ‘ean and standard deviation for the scores in Mathematics are 70 and 20, and for the scores in are 30 and 10, respectively, in which subject he did better? Solution From the given data, direct comparison of his relative status in Mathematics and in cannot be made because the marks achieved by him do not belong to the same scale Measurement. Therefore, his raw scores in both the subjects should be transformed into a Scale—a standard z-score. In Mathematics: X = 90; X =70;0 = 20. 238. STATISTICS FOR BEHAVIOURAL AND SOCIAL SCIENCES +520 +840 +1.280 Figure 7.16 Showing the difficulty values of the problems ‘Thus, the zscore of the student in Mathematics is +1.0 and in English is +2.0, This follows that he did better in English than that in Mathematics, 6. To compare two distributions in terms of ‘overlapping’ Example 7.19 Given the distribution of the mem« boys’ mean score is 21.49 with ¢ of 363, The girls’ mea tions, find out what percentage of boys exceeds the median of the girls’ distribution? Solution Boys: N=300; X-=21, 49; 6=363; Mdn=2141, Girls: N=250; X=2368; 0-512; Mdn=2366, Both distributions are assumed to be not above the boys’ mean. Dividing 2.17 bi 7 100 7. To determine the relative diffic test items. (1, 700"2?} exceed the girls’ median, ulty of test questions, problems and other Example 7.20 Three problems A, B and C have been solved by 1096, 20% and 30%, reper tively, of a large group. If we assume the capacity measured by the test problems to be distr uted normally, what is the relative difficulty of problems A, Band C? he : an oft Our first task is to find for Problem A, a cut in the distribution, such that 10% epee (the per cent passing) lies above and 90% (the per cent failing) lies below ts iven point. The highest 10% in a normally distributed group has 40% of the cases beter at lower limit and the mean (see Figure 7.16). From the normal-curve table (Table A), we fin 4 Chapter7 Standard Scores and the Normal Probability Curve 239 Bo 1.80 Obs XK 406 185 30 ore 717 Showing percentage or number of students belonging to five subgrou havi : equal range of ability nging ibgroups having pom ie, 40%) of a normal distribution lies between the mean and +1286, Hence, Problem osiongs ata point on the base line of the curve, a distance of +1.28e from the mean, and a ordingly, +1280 may be set down as the difficulty value of this problem. Problem B, passed by 20% of the group, falls at a point in the distribution 30% above the mean From Table Ait is found that 29.95% (ie, 30%) of the group falls between the mean and “S40; hence problem B has a difficulty value of +0840. Problem C, passed by 30% of the group, falls at a point in the distribution 20% above the mean From Table A, it is found that 19,85% (i.e, 20%) of the group lies between the mean and 1f52o;hence, Problem Chas a difficulty value of +0.520. Let us summarise our results as follows: [- Problems Passed By (%) Values Difference A 10 +128 = B 20 +084 044 30 +052 032 _Thecdifference in difficulty between Problems B. and C is 0.32, which is roughly 3/4 of the o difference in difficulty between Problems A and B. Since the percentage difference is the same in thetwo comparisons, itis evident that when ability is assumed to follow the normal curve, the o differences, ut not the percentage differences, are the better indices of differences in difficulty. &.To separate a given group into subgroups according to capacity, when the trait is normally distributed. Example 7.21 Suppose that we have administered an entrance test for admission in the PG Department of Psychology, Utkal University, to a group of 400 undergraduate students. We Wish to classify the group into five subgroups A,B,C, Dand E, according toability, the range of ability to be equal in each subgroup. On the assumption that the trait measured by our testis Normally distributed, how many students should be placed in subgroups A, B, C, Dand E? Solution Let us first represent the positions of the five subgroups diagrammatically on a ‘gtmal curve as shown in Figure 717. Ifthe base line of the curve is considered to extend from 10 +3o, that is, over a range of 60, dividing this range by S (the number of subgroups) gives pr ES 240 STATISTICS FOR BEHAVIOURAL AND SOCIAL SCIENC! ‘ qT ls May be ja; : o be allotted to each group. These five interval belaid ie tase Une eg hon in ihe figure, and perpendiculars erected to demarcate the vane “ee e : thus, the point on the base line above which , Ae Leee nas eroup Beavers the next L2e, thus, ue eee above w ic fal +060 tie, L89—12a=0.60) .Subgroup C lies +060 to the right and . Déato the est Subgroups D and E occupy the same relative positions in the lo ver half ofthe cuvethay pe A occupy in the upper half, respectively. In other words, suber uP Bele ween x +30;B falls between +0.6c and 41.80; C falls between -0.6c and +0.65; D falls tween 0, nan ls iy left of the ls, ‘180; and E falls between -1.86 and -3o, Now find out what ntage of the whole group belongs to each of the § sage eae percentage of the whole group belongs in SubErOUD A, we mye find what percentage of a normal distribution lies between +3o (upper limit of the Subgroup A) and +186 (the lower limit of the subgroup A). From the normal-curve table (Table A) i is found that 49.87% of a normal distribution lies between the mean and +30, and 46414, between the mean and +180. Hence, 3.46% (49.87-46.41%) or roughly about 3.5% of the tota, area under the normal curve lies between +180 and +3a; and accordingly, Subgroup A com Prises 3.5% of the whole group. “The percentages in the other subgroups are calculated in the same way, Subgroup B coy, the cases Tring between +60 cal iee ae Table A, we find that 46.41% of the Norma distribution falls between the mean and +1.80 (upper limit of subgroup B), and 22.5796 fai, ‘between the mean and +0.6c (lower limit of subgroup B). Subtracting, we find that 23.84%, (46.41-2257%) or about 23.8% of the distribution belongs in subgroup B. i M Percentage Of cases lies between the mean and ~06c. Therefore, Subgroup C includes 22.57% x2=4514% or about 45% of the distribution. Finally, subgroup D, which lies between ~06c and ctly the same per B; subgroup E, which lies between -1.80 and _39, con tains the same percentage of the whol i as subgroup A. Therefore, subgroup D and subgroup E contains 3.5% of the w1 i L. Subgroups A|Bi|c|DIiE L _Petcentage ofthe whole group in each subgroup 35 | 238 | 45.0 | 238 |35 Number of students in each subgroup out ofa total of 400 14_| 952 |180 | 952 |14 Dumber ofstudentsinwholenumber 9s [180 | 95 7.8 Common Causes of Asymmetry Divergence or deviation of a frequency distribution from the normal form is known asymmetry. Deviations from the normality are called the ertors of normal distribution; the errors are of two types—skewness and kurtou n s which have already been discussed earl this chapter, Now the question arises as to why frequency distributions deviate from normal form. The following aresome of the common tae. a asymmetry: ee Selection. Selection isa potent cause of skewness and kurtosis. Selection will produce skewness and kurtosis in distributions even when the test has been adequately constructed and carefully administered. Therefore, the samples must be selected randomly from the population; each element of the population should have equal chance of being selected and included in the sample. Moreover, the sample size should pelarge (i.e, more than 30). In other words, the samples should be large and an unbiased selection. Unsuitable or poorly made tests. The test administered should be suitable or befitting for the trait being measured. If a test is too easy, scores will pile up at the high-score end of the scale, whereas when the test is too hard, scores will pile up at the low-score end of the scale. Therefore, the test should consist of items having 50% or intermediate difficulty levels. Moreover, the test items should have discriminatory abilities, which can discriminate subjects as having high, medium or low in the trait being measured. The ambiguous or poorly made items should be eliminated. Non-normal distributions. Skewness or kurtosis or both will appear when there isa real lack of normality in the trait being measured. Non-normality of distribution ‘will arise, for instance, when some of the hypothetical factors determining the strength ofatrait are dominant or pre-potent over the others, and hence are present more often than chance will allow. Illustrations may be found in distributions resulting from the throwing of loaded dice. When off-centre or biased dice are cast, the resulting distribu- tion will certainly be skewed and probably peaked, owing to the greater likelihood of combinations of faces yielding certain scores. The same is true of biased coins. Errors in the construction and administration of tests. Various factors in addition to those mentioned above make for distortions in score distributions. Differences in the size of the units in which test performance has been expressed, for example, will lead to irregularities in some distribution. If the items are very easy at the beginning and very difficult or hard later on, an increment of one point of score at the upper end of the test scale will be much greater than an increment of one point at the low end of the scale. The effect of such unequal or ‘rubbery’ units jams the dis- tribution and reduces the spread. Scores tend to pile up at some intermediate point and to be stretched out at the low end of the scale. Errors in timing or in giving instructions, errors in the use of scoring keys, large ation, all of these factors, if they cause some subjects differences in practice or in motiv: toscore higher and others to score lower than they normally would, tend to make for skewness in the distribution. i «i Gil) (iv) 7.9 The Normal Approximation to the Binomial Ithas been observed that as 1 increases in size, the symmetrical binomial is more closely approx- mated by the normal distribution. This means that the normal distribution may be used to esti- ™ate binomial probabilities. If we are given a situation where 10 coins are tossed simultaneously, the exact binomial probability and its normal approximation of occurrence of different number otheads will be as shown in Table 7.2. For the detailed computation of the binomial probability occurrence of different number of heads, the reader is required to refer to Section 67. « Now let us discuss the procedure of determining the normal probabilities of occurrence different heads while tossing 10 unbiased coins at the same time. Here, n=10, mean (X) 10 Of the binomial = n/2=10/2=5, the standard deviation w-[ = {2 = 1.58; and variance 242 STATISTICS FOR BEHAVIOURAL AND SOCIAL SCIENCES, Table 7.2) Comparison of binomial probabilities with corresponding normal a mations for n=10 and p=" PProy;, ime | nao Number of Heads Exact Binomial Normal-Curye > b Probability |__—Approximatig, 7 1/1024=0.001 ee L 010 [ oon 45/1024=0.044 7 120/1024=0.117 & 210/1024=0.205 5 252/1024=0.246 4 210/1024=0.205 3 2 120/1024=0.117 45/1024=0.044 10/1024=0.010 = o 1/1024=0.001 Total 1.000 (o*)=n/4=10/4=25. Because the normal distribution is continuous, and not discrete, we con: sider any value as covering the exact limits 0.5 below to 05 above the given value For exam ple, value 10 ranges from 9.5 to 10.5. These two limits should be transformed into z-scores. The z4core equivalent to a raw score 95= X~% _ 28 = =285. Similarly, the zscore equivalent oer toarawscore105= ~—X . 10559 3.48. then from the normal-curve table (Table Aafthe appendix), we should find out the proportion of the area of the normal curve falling between the mean and +2.8So is 0.4978, and the area between the mean and +3486 is 0.5000 Thus, the area lies between +2.85c and +3.480 is 0.0022 (ie, 0.50000, .0022), Hence, the normal Probability of occurrence of 10 heads is 0.002. Likewise, the value 9 1s covering the exact limits 85-95. In a standard-score form, the Val 85 is equivalent to z-(8.5-5.0)/158=2.21, and 95 is equivalent to (9.5_50)/1582285. a Proportion of the area of the normal curve falls between the mean and +22lo is 04864, between the mean and +2859 is 0.4978. Thus, the area lies between +2.216 and +2.850 is 0.01 (e- 0.4978- 0.4864). Hence, the normal approximation of binomial probability of occurte™ Of 9 heads is 0.011, The score B is covering the exact limits 7.5-85. In the standard-score form, the value ie equivalent to z of 158 and value 85 is equivalent to zof 221. The proportion of the area 0% normal Curve falls between the mean and +1.S80 is 0.4429, and between the mean and ‘he is OAB64. Thus, the area lies between +1.580 and 2.216 is 0.0435 (ie, 04864-04429). Hence normal approximation of binomial probability of occurrence of & heads is 0044. sue 658 The score 7 is covering the exact limits 65-75. In the standardscore form, te val te equivalent to +0.95z and the value 7.5 is equivalent to +1.58z. The proportion of the 7 Chapter? Standard Scores and the Normal Probability Curve 243 sna curve falls between the mean and +0.950 is 0.3289, and between the mean and +580 ysal us, the area Ties between +0950 an +1586 is 0.114 (i.e, 0.4429 0.3289) Hence, the ite approximation of binomial probability of occurrence of 7 heads is 0.114. Ina similar vein, value 6 is covering the exact limits 5.5-6.5. In the standard-score form, pe value 85 is equivalent to +0322 and the value 65 is equivalent to +0.95z. The proportion ithe ‘area of the normal curve falls between the mean and +0326 is 0.1255, and between : \d +0.95c is 0.3289. Thus, the area lies between +0.32 and +0.95e is 0.2034 (ie, the mean an : - 21255). Hence, the normal approximation of the binomial probability of occurrence 2 ead 50208 ; - similarly, value Sis covering the exact limits 4.5-S.5. In the standard-score form, the value sisequivatent to-032zand the value 55 is equivalent to 40322 The proportion of the area 4 fe imal curve ies between #03320 0.251 ie, 0.12552), Hence, the normal approxima: tion of the binomial probability of occurrence of 5 heads is 0.251. Nynce the normal curve isa bilaterally symmetrical and bell shaped, the normal approx: mations of the binomial probabilities of occurrence of 4 3, 2, 1, heads are eq al to that of imbecurtence of 6, 7, 8, 9,10 heads, respectively. The exact binomial probabilities and the iremal-curve approximations to these binomial probabilities are given in Table 72. ‘et usnow compare the normal-curve approximation with the exact binomial probability ofobtaining 7 or more heads in tossing 10 coins simultaneously. Thus, using the normal-curv¢ approximation to the binomial, we estimate the probability of obtaining 7 or more heads ir tpsing 10 unbiased coins as 0.171 (Le, 0.114 +-0.044+ 0.011+0.002). We may compare this with the exact probabilities obtained directly from the binomial expansion shown in Table 72 ‘This binomial probability is 0.172 (ie, 0.117+0.044+0.010+0.000. Here, we note that the dis crepancy between the estimate obtained from the normal curve and the exact binomial prob ability is trivial or very negligible. Table 72. compares the binomial and normal probabilities for n=10 and p-1/2. We note that in this instance the differences between the exact binomial probabilities and the corre sponding normal approximations are small. The accuracy of the approximation depends both on n and p; as n increases in size, thy accuracy of the approximation is improved. For any mas p departs from 1/2, the approxima tion becomes less accurate. Summary : = Inthis chapter, we have discussed the standard scores (or zscores) and the NPC. A zscore is transformation of a raw score. It designates how many standard deviation units the corre sponding raw score is above or below the mean. A z distribution has the characteristics tha the zscores have the same shape as the set of raw scores, the mean of zscores always equal 2e10, and the standard deviation of scores always equals I. In addition, we discussed the pu poses and applications of zscores. Next, we discussed the NE "We pointed out that the norm: curve isa bell-shaped curve and gave the equation describing it, Moreover, we discussed th properties of the normal curve, divergence from normality: skewness and kurtosis, comm Causes of asymmetry, procedures for normalising 2 distribution of scores, relations that exi among the constants of the NPC, the area contained under the normal curve and its relatic to zscores, Finally, we showed how to use zscores in conjunc tion with a normal distributic to ind: a) the percentage or frequency of scores corresponding to any raw score in the dist bution and (b) the raw score corresponding to any frequency or percentage of scores int distribution, In addition, other applications of the NPC and the normal approximation to t binomial were also discussed. i 244 STATISTICS FOR BEHAVIOURAL AND SOCIAL SCIENCES Key Terms © Asymmetry © Normal probability curve © Asymptotic © Rawscore © Constants. © Skewness © Deviation score © Standard score © Divergence from normality © Standardised distribution * Kurtosis © zscore © Normal curve . z-score transformation. Questions and Problems What is a z-score? Discuss the characteristics and purposes of z-scores, Explain the applications of zscores. Describe exactly what information is provided by a z-score. Define the normal curve. Discuss the importance of the normal distribution. What is the equation of the NPC? Explain the Properties of the normal curve, What do you mean by divergence from normality? Discuss the different types of errors in normal distribution. What is skewness? Elucidate the properties of the skewed distri bution, What is kurtosis? Analyse the different types of kurtosis, Critically analyse the various applications of the NPC. What do you mean by asymmetry? Examine the various causes of asymmetry. For a population with .=50 and o=8, find the z-score that corresponds to each of the following X values: 58, 34, 70, 46,62. and 44, For a population with 1=80 and o= 0, find the scores (X value) that correspond to cach of the following zscores: 250, -0:50, -1.50, 0.25 05s and 1.00. 13. A distribution of scores has a standard deviation of «=10. Find the z-score corte: sponding to each of the following values: (@) A score that is 20 points above the mean. (b) Ascore that is 10 points below the mean, (©) A score that is 15 points above the mean, (d@) Ascore that is 30 points below the mean. 14. Fora population with 1=90, a raw sc the standard deviation for this distri 15. Fora population with o=8,a mean for this distribution? : 16. Find the height of the ordinate of the normal Curve at the following z-values:-215, ~1.53, +0.07, +0.99 and +276. 17. Consider a normally distributed vari: height of the ordinates at the follow’ ar eeNe FS oan 8 ‘Ore of X=93 corresponds to z=+0.50. What i ibution? Taw score of X=43 corresponds to z=—0.25. What is the ‘able with X =50 and s=10. For N=200, find the ing values of X:25, 35, 49, 57 and 63.

You might also like