0% found this document useful (0 votes)
5 views

Adobe Scan 06-Dec-2023 (1)

This document outlines the learning objectives and key concepts related to the presentation of data, focusing on the distinction between scores and data, and the importance of converting raw data into useful information. It discusses the construction and use of frequency distributions, including qualitative and quantitative classifications, and various graphical representations such as histograms and pie charts. Additionally, it provides guidelines for creating frequency distribution tables for both discrete and continuous variables.

Uploaded by

Kaushiki Riya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
5 views

Adobe Scan 06-Dec-2023 (1)

This document outlines the learning objectives and key concepts related to the presentation of data, focusing on the distinction between scores and data, and the importance of converting raw data into useful information. It discusses the construction and use of frequency distributions, including qualitative and quantitative classifications, and various graphical representations such as histograms and pie charts. Additionally, it provides guidelines for creating frequency distribution tables for both discrete and continuous variables.

Uploaded by

Kaushiki Riya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 39
real Presentation of Data Learning Objectives After reading this chapter, you will be able to: Show the difference between scores and data. Convert raw data to useful information. Understand important concepts and principles of the graphic presentation of data. Develop the skills of tabulating data into frequency tables. Construct and use frequency distributions. Prepare graph frequency distributions with histograms, polygons, Ogives, bar diagrams and pie diagrams. Describe advantages and limitations of each graphical representation of data. © Use frequency distributions to make decisions. eoccoe 2. Introduction Data obtained from the conduct of experiments or surveys are frequently collections of numbers or numerical scores. Mere inspection of a set of numerical scores will ordinarily com- municate very little to the understanding of an investigator. Moreover, the numerical data are collected with a definite purpose. But that purpose is not achieved by merely collecting the scores. It is extremely difficult to make any sense out of the individual scores or numbers. Similarly, it is also difficult to know the group characteristics directly from the raw scores, Therefore, for understanding, analysing and interpreting data, for more precise communica. tion and for comparison with other sets of data, raw data are required to be frequently arranged, classified, described and graphically represented. Before going to describe the tabular and graphical forms of the presentation of data, we should discuss about the scores and data. 2.2 Scores and Data Data are figures, ratings, checklists and other information collected in experiments, surveys and descriptive studies. Measurements that are made on the subjects of an experiment are 39 40 STATISTICS FOR BEHAVIOURAL AND SOCIAL SCIENCES. called data (Pagano, 1994). Usually, data consist of the measurements of the dependent vari- able or of other subject characteristics, such as age, sex, number of subjects and so on. Data as originally measured are often referred to as raw or original scores. In common parlance, the term data is generally synonymously used with statistics. For example, one may say that he or she has seen the data of industrial accidents in India during the last year instead of saying that he or she has seen the statistics of industrial accidents in India. Thus, ‘data’ is a broad term signifying a number of attributes about a fact or a phenomenon. A score, on the other hand, is a numerical value representing a point or distance along a continuum. A score is usually a part, and sometimes, the most basic about data. The structure of data is most commonly built up on the basis of scores, As scores are the numerical repre- sentation of the attributes of a variable, it is dependent on the nature of the variable, There are two types of scores called continuous scores and discrete scores relating to the nature of the variable being measured. 2.2.1 Continuous Score A continuous variable is one that theoretically can have an infinite number of values between adjacent units on the scale. A continuous score is thought of as a distance along a continuum, Tather than as a discrete point. An inch is the linear magnitude between two divisions on a foot rule, and in the same manner, ascore in an intelligence test isa unit distance between two limits. For example, a score of 120 upon an intelligence examination represents the interval nes up to 1205. The exact midpoint of this score interval is 120 as shown in the diagram low. Score 120, 1195 120 1205 Inclusive Type This score interval is known as inclusive type because both the upper and lower limits are included in the given score. Other scores may be interpreted in the same way. A score of 22, for instance, includes all values from 215 to 22.5, that is, any value from a point 0.5 unit below 22 to a point 0.5 unit above 22. This means that 217, 22.0 and 22.4 would all be scored 22. The usual mathematical meaning of a continuous score is an interval that extends along some dimension from 0. unit below to 0.5 unit above the face value of the score. There is another and somewhat different meaning that a test score may have. According to this second view, a score of 120 means that an individual has done at least 120 items correctly, but not 121. Hence, a score of 120 represents any value between 120 and 121. Any fractional value greater than 120, but less than 121, for example, 120.3 or 1208, since it falls within the interval of 120-121, is scored simply as 120. The middle of the score is 120.5, as shown in the below diagram. ‘Score 120 | | 120 1205 121 Exclusive Type Ee eee ee ae ae ce Chapter2 PresentationofData 41 This score interval is called as exclusive type because the t limit is excluded from the given score and is included in the immediate oe . ‘diate next higher score. Both of the above ways of defining a score are valid and useful. But which one to be used will depend upon the way in which the test is scored and on the meaning of the units of measurement employed. For example, if each of the 10 boys recorded as having a height of 65 inches, this will ordinarily mean that these heights fall between 64.5 and 655 inches (middle value 65 inch) and not between 65 and 66 inches (middle value 65.5 inch). On the other hand, the ages of 20 children, all recorded as being 8 years old, will most probably lie between 8 and 9 years, will be greater than 8 and less than 9 years (middle value 8.5 years), But 8 years old must ibe taken in many studies to mean 7.5 up to 85 ye to remember is that results obtained from treati 1 Il often to decide, sometimes arbitrarily, what mean- ing a score should have. In general, it is quite safer to take the mathematical meaning of a score unless itis clearly indicated otherwise. This is the method followed throughout this book. For instance, scores of 60 and 120 will remain $9.5-60.5 and 1195-1205, respectively, and not that of 60 up to 61 and 120 up to 121. 2.2.2 Discrete Score Scores for which individual values fall on the scale only with distinct, real gaps are called dis- rete scores. Such scores usually take only integer values. Those are absolute and no approxi- mate measure of the attribute in question. For example, 365 runs in a cricket match means 365 runs absolutely. It cannot be any fraction less or more of it. We can never say it as 365.5 runs. Similarly, members present in a meeting are a definite integer. We cannot say that there were 56.3 members present in the meeting; rather, we can say that 56 members were present in the meeting. Moreover, we can say that the size of a family is 4 or the number of children in a family is 2. These are discrete scores. But as a matter of fact, most of the psychological and educational measurements do not use discrete scores because of the underlying continuity in the nature of the attributes of a variable. In some cases, there are only purposive uses of the discrete scores to facilitate the collection of data when, in fact, the attribute is in a continuous scale. 2.3 Drawing Up Frequency Distributions One of the most fundamental and important methods of condensing, summarising and putting order into a disarray of data is the frequency distribution. A frequency distribution isa series where a number of scores with similar or closely related values are put in separate bunches or groups, arranged in order of magnitude in a grouped series. An orderly arrange- ment in magnitude is called an array or a series. Such a series has two parts: on its left, there are magnitudes of values and, on its right, there is the number of times a value or a group of, values (frequency) is repeated. The frequency is juxtaposed to its value or group. In other words, a frequency distribution is a table showing the number (frequency) of individuals, cases, events or scores of a sample in each of the classes into which the variable under inves- tigation has been classified. The main purpose of a frequency table is to simplify the presen- tation and to facilitate comparison. According to the nature of the variable under study, the frequency distribution is of two principal types, such as the qualitative frequency dis- tribution and the quantitative frequency distribution, which are discussed in the following sections. 42. STATISTICS FOR BEHAVIOURAL AND SOCIAL SCIENCES Table 2.1 Frequency distribution of religion groups Frequency (f) 120 50 30 200 ristian Total ()) 2.3.1 Qualitative Frequency Distribution A qualitative frequency distribution is a table showing the number of individuals or cases of a sample in different classes of an attribute or nominal variable, such as honesty, beattty, sex, employment, intelligence, occupation, literacy, blood group, phenotype, hair colour, eye colour, personality, race, religion and so forth. The attributes or nominal variables that are not capable of quantitative measurement are termed as qualitative or descriptive attributes. An attribute or a nominal variable cannot be measured or expressed quantitatively in numerical units, so it is divided into classes depending only on the qualitative distinction between the individuals of a sample with respect to the presence or absence of a particular property or characteristic. Therefore, the classes are given names or numbers as simple labels without any quantitative significance or numerical range, and are separated from each other by interven- ing gaps. The classes may be assumed to be located at specific points on the scale or domain of that variable; no individual or case falls in the yap between two such points. Such frequency distributions of attributes are, thus, known as point distributions. For such a qualitative fre- quency distribution, the classes of the variable are entered in one column of asimple frequency table, while the number of individuals belonging to each such class is counted from the sample and then entered as frequencies in another column against that particular class (see Table 2.1) 2.3.2 Quantitative Frequency Distribution If the data are classified on the basis of a particular phenomenon which is capable of quantita- tive measurement, such as age, height, weight and so on, it is termed as quantitative classification. The quantitative frequency distribution consists of a table showing the number of individuals or cases of a sample in different classes into which the entire range of scores (numerical values) of a measurement variable has been grouped. The data, arranged into such a frequency distribution, are called grouped data because the scores have been distributed among the classified groups of scores of the variable. The classification of scores in this way reveals the salient features of the variable in the sample in a meaningful way, for example, the class having the lowest or highest frequency, and the pattern of distribution in various classes. It also helps in the application of statistics for the analysis and interpretation of the data, The quantitative frequency distribution may broadly be divided into two groups: (a discrete or discontinuous frequency distribution and (b) continuous frequency distribution. 2.3.2.1 Discrete or Discontinuous Frequency Distribution A much better way of the representation of the data is to express it in the form of a discrete or ungrouped frequency distribution. Because consecutive scores of a discrete variable, such a Chapter2 Presentationof Data 43 Table 2.2 Discrete frequency distribution of families having different number of children [ Lae Number of Families o 1 — 70 ee 60 3 L 4 i saci SS | . . 2 Total (N) : 150 family size, number of children in the family or class examination scores of students are sepa- rated by real gaps due to the impossibility or impracticability of further fractional subdivision of its measure, the classes of such a variable do not show continuity with each other—the classes are delimited sharply with gaps between them. In framing a discrete or discontinuous frequency distribution, this distinction is retained by recording the class intervals in whole units only and by omitting true class limits in fractional units or decimals. This leaves gaps between the classes; no member of the sample can occur at these gaps. In other words, here the frequency refers to a given discrete value and not to a range of values. In a discrete variable, the exact measurement is possible in whole numbers. The frequency distribution of a discrete variable is sometimes arranged in the form of a simple frequency table where each single distinct score is entered individually as if to consti- tute a class by itself; no class is constituted here by the grouping of a range of more than one score in each class. The frequency of a particular score is entered directly against that score. Table 2.2 is an example of a discrete frequency distribution of families having different num- bers of children; no family can exist with a fractional number of children like 3.5 or 4.3, and the real gaps exist between the whole numbers of children. 2.3.2.2 Steps for Discrete Frequency Distribution The following steps should be followed for framing a frequency distribution table of a discrete variable: (i) Write the name of the variable or ‘X’ if the name is not given. (ii) Below the title/name of the variable or ‘X’, arrange all the scores in rows carefully put- ting a comma (,) after each score. (iii) Pick up the highest and lowest values from among the scores and write them below the row/(s) of scores. Find out the range of scores by deducting the lowest score from the highest score in the entire series of scores (Le, Range = Highest score -lowest score) and note it there. Count the number of scores in the entire series and write it below as N=? (iv) Write the scores in the first column of the frequency table in serial order either in ascending order (ie, from the lowest to highest scores) or in descending order (ie, from the highest to lowest scores) according to your convenience. 44 STATIGTICS FOR REMAVIOURAL AND SOCIAL SCIENCES: 1) Then cancel out each score in the series by an oblique stroke () and put a vertical stroke (1 in the second column of the frequency table, called the Tally marks agains, the appropriate score (value of the variable) whenever it occurs. After a particular Value score has occurred four Limes, for the fifth occurrence, put a cross tally mark) oon the first four tally marks like III] to give you a block of five tallies, When a particular Scone oocurs for the sixth time, vou put another tally mark against it (after leaving some Space from the first block of 8) and for the tenth occurrence, you again put a cross tally mark on the sixth to ninth tally marks to get another block of $ and so on. This process 'S Tepeated till all the scores in the series are recorded, Count the number of tallies each score/value is repeated, and write the number against the corresponding score/value in the third column, entitled ‘Frequency’, of the frequency distribution table. ‘xample 2.1 Jowing marks Marks vi Ima class examination in mathematics, 80 students of class VIL obtained the fol Out of a full mark of 10. Prepare a frequency distribution table for the given data LEER S VILALTA LOVER G8, SOLVE 45,8, 2,3,3,8,10,8,5, 10 3,5, 10,7,8,9,2,4,5,4,9,3,6,9, 8,9, 10,8, 10,0,5,9,1,5,2,2,8 Solution Variable: Mathematics marks; Highest score: 10; Lowest score: 0 Range - Highest score-lowest score 100-10; N-80. See Table 2.3. 23.3 Continuous Frequency Distribution {Redistribution of frequencies of acontinuous variable isknown as the continuous frequency distribution. It becomes necessary in the case of some variables which can take any fractional Nalues and in whose case an exact measurement is not possible. There is no real gap between the classes of such a variable so that the consecutive groups or classes are continuous with cach other, However, a discrete variable can be presented in the form of a continuous tre. quency distribution when the discrete distribution is likely to be too long and unwieldy to handle and somewhat odd in presentation. 2.3.4 Facts about Continuous Frequency Distribution The following are some basic principles for forming a continuous or grouped frequency distribution, 2) Number of class intervals A class should be clearly defined and should not lead to any ambiguity. Furthermore, they should be exhaustive and mutually exclusive (Le, non overlapping) so that any value of the variable corresponds to one and only one of the classes. In other words, there is one-to-one correspondence between the value of the vart able and the class Chapte Presentation of Data 45 ncy distribution Table 2.3 Tabulation of a discrete frequer Mathematics Marks | tatty Marue | reequans feaeeeateerOR — | 4 a aianaseniaasieey 10a ee 2 al 3 ee ae 4 eee THT Sees THU 6 THU We 7 THU mm 8 TH TAL 9 Tam 10 TH l No hard-and-fast rule can be laid down for the number of classes into which a frequency distribution should be divided. If there are too many classes, many of them will contain only a few frequencies, and the distribution may show irregularities, which are not attributable to the characteristics of the variable being measured. If there are too few classes, so. many frequencies will be crowded into a class as to cause much information to be lost. Flence, the number of classes to be used depends partly upon the nature of the data and partly upon the number of frequencies in the series. The greater the number of frequencies, the more classes we may have. The following four points are usually considered while deciding about the number of class intervals in a frequency distribution: (i) the total frequency (ie, total number of observations in the distribution); (i) the nature of the data (ie, the size of magnitude of the values of the variable); (iii) the accuracy desired or aimed at an (iv) the ease of computation of the various descriptive measures of the frequency distribution such as mean, variance and so on, for further processing of the data. However, from a practical point of view, the number of classes should neither be too small nor too large; a balance should be struck between these two factors, namely, the loss of infor- mation in the first case (ie, too few classes) and the irregularity of the frequency distribution in the second case (i.e, too many classes) to arrive at a pleasing compromise, giving the opti- mum number of classes in view of the statistician. Normally, the number of classes should not be greater than 20 and should not be less than 5, of course keeping in view the points (i{iv) mentioned earlier together with the magnitude of the class interval, since the number of class is inversely proportional to the magnitude of the class interval. A number of thumb rules have been proposed for calculating the proper number of classes. However, an elegant, though approximate, formula is given by Sturges, known as the Sturges rule that reads K=1+3322log’, 46 STATISTICS FOR BEHAVIOURAL AND SOCIAL SCIENCES where Ke number of class intervals (classes) Ns total frequency or total Number of observation in the data. The value obtained by this formula is rounded to the next higher integer. Since the log of one digit number is 0, two-digit number is 1, three-digit number is 2, four-digit number is 3 and so on, the use of the Sturges formula restricts the value of K, the number of classes, to be fairly reasonable, Example 22 A test of syllogistic reasoning is school. Apply the St administered to 100 students of Grade VIII of a urges formula to decide about the number of class intervals: Sturges formula: K=143322logY 143322 log 100 View. The rule, however, fails if th (b) Type of class intervals. W limit and a lower limit, fied by two extreme v: h is a fairly reasonable number from a practical point of fe number of observations is very large or very small, have discussed that each score in a continuous scale has an upper Which are called score limits. Likewise, each class interval is speci- ‘alues called the class limits, the smaller one being termed ay the lower limit and the larger one the upper limit of the class, The elas limits in a continuous frequency distributior n can be presented in any one of the following forms: (i) exclusive or Tepetition type (A), (ii) inclusive or non-repetition type (B) and (iii) boundary or decimal type (C). These three types are exemplified in the following table. Example 23 AD Se eee Class Midpoints f| Class | Midpoints | f | Intervals | Intervals: i 75-80 77 | _7 id + 7 95 70-75, iY | | 72 Le | 695-745 + 65-70 | 67 is 65-69 | 60-65 62 60-64 55-60, | 7 55-59 | SO0-SS 82 2 50-54 N-20 All the above three forms of presentation are sin ply three ways of expressing identically the same facts. In the exclusive type (A), the class interval of $0-55, for instance, is supposed tO include all scores beyond 49.5-5455, and it does not include any score beyond 545. The uppet limit o upper extreme Value 55 is excluded from the respective class and is included in the immediate class. Therefore, these class intervals ay depicted under (A) are termed as exclusive Chapter2 Presentationof Data 47 classes. In the other inclusive type (B), the class interval of 50-54, for example, also includes all scores between 49.5 and 54.5, which means that as regards functional characteristics both exclusive and inclusive types are the same; however, both the upper and lower limits are included in the respective class intervals of inclusive type as depicted under B (ie, inclusive classes). The class interval of 49.5-54.5, for instance, in the boundary type (C) also logically includes all the scores between these two limits, and functionally the same as the other two. Furthermore, the class limits, the size of the class interval and the midpoint of the class inter- vals for all three forms of distributions are the same. For the rapid tabulation of scores within their proper intervals, method B is to be preferred to ‘A’ and ‘C’. In method A, it is very likely to let a score of 65 slip into the interval of 60-65 simply owing to the presence of 6S at the upper limit of the interval. Method ’C’ is clumsy and time-consuming because of the need for writing 0.5 at the lower and upper limits of every class interval. However, this method is helpful when the data contain decimal scores. Method 8B, while easiest for tabulation, offers the difficulty that in later calculations one must con- stantly remember that the expressed class limits are not the actual class limits; for example, the class interval of 50-S4 begins at 49.5 (not at SO) and ends at $4.5 (not at 54). If this is clearly understood, then method ‘B’ is as accurate as methods ‘A’ and ‘C’. Method ‘B’ has generally been used throughout this book. (©) Size of the class intervals. Since the size of the class interval is inversely proportional to the number of classes (i.e, class intervals) in a given distribution, from the earlier discussion, it is obvious that as the number of class intervalsis decided, the size of the class interval is an automatic consequence where it is proposed to use equal-sized class intervals. However, a choice about the size of the class interval will also largely depend on the sound subjective judgement of the statistician keeping in mind other considerations such as N (total frequency), nature of the data, accuracy of the results and computational case for further processing of the data. Here an approximate value of the magnitude (ie, width) of the class interval, say, ‘i’, can be obtained by using the Sturges rule that reads as a Range Range Number of class intervals 1+3.322 log’ where ‘Range’ is determined by the difference between the largest (L) and smallest (S) scores in the given distribution; that is, Range =Largest score-Smallest score. Example 2.4 Fifty students of class IX of a school appeared at a school test in mathematics. The student who stood first secured 76 marks and who stood last in the test secured 12 marks out of a full mark of 100. Using the Sturges rule, decide the size of the class interval (i) to tabu- late a frequency distribution. Solution Largest score=76; Smallest score=12; N=SO. Range -Largest score - Smallest score =76-12=64. Range Range ‘~ Number of classintervals 1+ 3322log™ os oF 1480s 15, 1+ 3322log™ 143322 4. 48 STATISTICS FOR BEHAVIOURAL AND SOCIAL SCIENCES, This shows that the size of the class interval (i) 14.808 may be used as 14 or 15 according to the convenience. On the other hand, given a frequency distribution, the size of the class interval is obtained in two ways: first, from the difference between the true upper limit and lower limit of @ particular class interval and second, from the difference between two upper or lower scores or Score limits of consecutive intervals, irrespective of the method of distribution used. Example 2.5 Class interval=50-54 ;i=49.5~S4.5=5, or 55-60, 55-59 545-595 S0-55 50-54 495-545 ins 5 55 55 (@) Midpoint of the class intervals. n a continuous frequency distribution, the midpoint of each Class interval is used to represent the class. Hence, scores grouped within a given interval in a frequency distribution is supposed to have spread evenly over the entire interval. This assumption is made about a frequency distribution irrespective of the size of the class interval. Therefore, if the spreading of scores around the mid-value is uneven, we run a higher risk of misinterpretation with a greater size of the class interval. But such problems are well taken care of when the numbers of scores in the distribution are large. Even if this Condition is not fulfilled, the midpoint assumption is not greatly in error because the lack of balance in one interval will usually be offset by the opposite condition in another interval. Asa rule, the midpoint of a class interval can be obtained as an average of the lower and upper Class limits of the said interval. But one should remember that we have discussed three different forms of class intervals, in two of which exact limits are not used. Therefore, Providing a method of calculation for each of the forms would make it more convenient. Example 26 @) i) © Class Interval Class Interval Class Interval 55-60 55-59 = _ 545-595 S0-SS 50-54 49.5-54.5 Midpoints. 50+S5-1 50+54 49.5+54.5 — =52, — i =582, SO = 52. 3 » , 5 52 As observed in the above example, in the exclusive or repeated type (A), the midpoint of a particular class interval is calculated by subtracting 1 from the sum of the two scores of the interval and then dividing the sum by 2. But in the other two types, inclusive or non-repeated type (B) and boundary or decimal type (C), midpoints are calculated by a simple average of the two scores of the interval. There is also another simple way to get the midpoints of different class intervals in a distri- bution where either the uppermost or the lowermost midpoint of the respective class intervals of a distribution is calculated by one of the aforementioned three methods, depending on the Chapter2 Presentation of Data 49 type of class intervals. If, for example, the lowermost midpoint of a class interval of a particular distribution is known, the succeeding midpoints can be calculated one by one by adding to it thesize of the class interval (Le, i) each time for each class interval. If, on the contrary, the upper- most midpoint of the class interval is known, the midpoints of the following class intervals are obtained one by one by subtracting from it the size of the class interval each time for each class interval. In the above-mentioned example, the lowermost midpoint is 52. As we go on adding 5 (ie, the size of the class interval) with it, we obtain 57, 62, 67 and 72, which are the midpoints of the succeeding class intervals. On the other hand, if we go on subtracting 5 (ie, the size of the class interval) from the uppermost midpoint 72, we obtain 67, 62, 57 and 52, which are the midpoints of the following class intervals. However, one should be very careful in using this method because if an error is committed in the first place while calculating the midpoint either of the uppermost or lowermost class interval, then the whole lot of midpoints will be wrong. 2.3.5 Steps for Continuous Frequency Distribution Generally, there are two methods for the tabulation of scores into continuous frequency distributions. These two methods are known as the tally method and the entry form method, which are separately discussed in the following, one after another. 2.3.5.1 Tally Method The following are the steps to be followed for framing a continuous frequency distribution of scores under the tally method: (i) Write the name of the variable or ’X’ if the name is not given. (ii) Below the title or name of the variable or ‘X’, arrange all the scores in rows carefully putting a comma (,) after each score. (iii) Pick up the highest and the lowest values from among the scores, and write them below the row(s) of scores. Find out the range of scores and note it below the row(s) of scores. (iv) Count the number of scores in the entire series and write it below the row(s) as N=? (v) Decide about the number and size of the class intervals by applying the Sturges rule, if the same are not already given. (vi) Set up the class intervals either beginning from the lowest score in ascending order or from the highest score in descending order by appropriately choosing any one of the three forms of class intervals discussed earlier. It is customary to present class intervals inacolumn and that too in the first column of the frequency table. Write the heading ‘class interval’ above it. (vii) Cancel out the first score by an oblique stroke (\) on it and immediately put a vertical stroke (1) against the appropriate class interval below the ‘Tally’ column. This is how a score is recorded in the frequency distribution. Likewise, cancel out each score to put it as a tally in its appropriate interval. When you get four such tallies against a class interval, for the fifth score to record, put a cross tally (\) on the first four tallies to indicate a block of 5. Continue like this until you record all the scores. The tallies are drawn to facilitate counting. (viii) Write ‘Frequency’ in the third column and record the number of frequencies against each class interval by correctly counting the tallies. Finally, sum up the frequencies over all the class intervals to indicate the number of cases in the distribution. Check whether or not the number of frequencies tallies with the number of scores (N) written below the row(s) of scores. SO STATISTICS FOR BEHAVIOURAL AND SOCIAL SCIENCES Example 27 A test of comprehension in English was given to 70 students of Grade VIII. The following marks were obtained by them, Tabulate the data into an appropriate frequency distribution table. Comprehension Scores: 38, 28, 2A BL, AW, 16, BZ, 3H, 38, 32, 36, 43, 40, 36, 3Z, 17, 36, 50, 34, 35, 38, 52, 39, 3Q,H, 3G, 3G, 32, 92,32 HL, I, 38, 34, 39, 99, 32, 36, 37, 40, 4H, 42, 18, 25, 26, 46, 47, 22,3, H, 33, 38,39, 38, 33H HL, 19,36, 48, 23, 28, 38, 32, 37, 34, 38, 36, 30 Highest score:5S_N=70 Lowest score:16 i=S Range=55-16=39 Class Intervals Tallies Frequency L 51-55 41-45 | 36-40 I 31-35 26-30 21-25 16-20 The total of the frequency columns is equal to the total of scores, and in statistics, itis known as the number of cases or ‘N’. The frequency distribution is the first step for some statistical computation. 2.3.5.2 Entry Form Method The following are the steps to be followed for framing a continuous frequency distribution of scores under the entry form method: (i) Write the name of the variable or ‘X’ if the name is not given. ii) Below the title or name of the variable or X’, arrange all the scores carefully in rows put- ting a comma (,) after each score. (iii) Pick up the highest and the lowest values from among the scores and write them below the row(s) of the scores. Find out the range of scores and note it below the row(s) of scores. (iv) Count the number of scores in the entire series and write it below the row(s) as N=?. (v) Decide the number and size of the class intervals by applying the Sturges rule, if the same are not already given Chapter2 Presentationof Data 51 (vi) Set up the class intervals either beginning from the lowest score or from the highest score by appropriately’ choosing any one of the three forms of class intervals discussed earlier. It is Customary to present the class intervals in a row. Above the row, write the heading ‘Class Intervals’. (vii), Then cancel out the first score by an oblique stroke (\) on it and immediately write the score below the appropriate class interval. Continue like this until you record all the scores in the frequency distribution table. Count the scores in each column to indicate the total frequency against each class interval. (viii) Sum up the frequencies over all the class intervals and write it in the lower right-hand corner below the column of scores as N=? Example 2.8 Sixty students of Grade IX were given a test of quantitative reasoning, and their scores are given below. Tabulate the data into an appropriate frequency distribution table. Quantitative reasoning scores: 1%, 3%, 80, 35, B1, 83, 3B, WW, KW, 35, 32, 46, 32, 34, 32,36, BO, $2 STATISTICS FOR BEHAVIOURAL AND SOCIAL SCIENCES 2.4 The Graphical Representation of the Frequency Distribution One of the impo rtant aims of statistics is to simplify the complex quantitative or numerical data so that it wi ill be meaningful as well as intelligible. Aid in analysing numerical data ma often be obtained from a graphic treatment of the frequency distribution. The Staphic methods or devices are nothing but pictorial presentations that catch the eye and hold the attention. Such pictorial presentations provide a bird’s eye view of the whole mass of data, These are visual aids that involve less thinking and mental manipulations in reflecting on the data than tabular presentation. There is no second opinion that graphic presentations are more impressive and attractive than the tabular presentation of data. It isa psychological fact that the graphs, and have the attention-catching power of pictures and diagrams are catchy visual presentations; they save time, provide scope for an easy comparison and have a lasting effect on the mind of the observer. Moreover, the pictorial or vistial presentations translate numerical facts, which are often abstract and difficult to interpret, into a more concrete and understandable form. Sometimes, even complex relations are brought out clearly by these devices, as a result of which analysis, interpretation and explanation of quantitative facts become easier on the part of the researcher. F or this and other reasons, the investigator utilises the attention-catching power of visual presentation of the research data. In general, there are four methods of representing a frequency distribution graphically. These four methods are the frequency polygon, the histogram, the cumulative frequency graph and the cumulative percentage curve or Ogive. These graphic devices are treated below one after another. However, first ofall, the general principles regarding the graphic presentation of data will be discussed. 2.4.1 General Principles of Graphical Representation of Data Let us now briefly review the simple algebraic principles which apply to all graphical repre- sentation of data. Graphing or plotting is done with reference to two straight lines called coor- dinate axes. There are two coordinate axes: the one is the vertical or the Y axis and the other is the horizontal or the X axis. These two basic lines are perpendicular to each other; the point where they intersect each other called 0 or the origin. Figure 2.1 represents a system of coordinate axes. ir divisions or quadrants. Let us designate or label the first, second, third and fourth quadrants as a, b, c and d, respectively. In the upper- right division or the first quadrant (quadrant ‘a’), both x and y measures are positive (+ +) . In the upper-left division or the second quadrant (quadrant ‘b), x is minus and y is plus (- +). In the lower-left division or the third quadrant (quadrant ‘c’), both x and measures are nega- tive (- >). In the lower-right division or the fourth quadrant (quadrant al) xis Positive or plus and yis negative or minus (+ ~). Examples (i) Plot point ‘A’ whose coordinates are: x=4 and y=3. (ii) Plot point ‘B’ whose coordinates are: x=-2and y=4 (iii) Plot point ‘C’ whose coordinates are:x=—3and y=-5, (iv) Plot point ‘D’ whose coordinates are: x=3 and y=-4. Chapter2 PresentationofData 53 Figure 2.1 System of coordinate axes Solutions To locate or plot point ‘A’ whose coordinates are x=4 and y=3, we go out from 0 to its right four units on the X axis and then up from the origin three units on the Y axis. Where the perpendiculars to these two points intersect, we locate point ‘A’ (see Figure 2.1). To locate or plot point ‘B’ whose coordinates are x=-2 and y=4, we go from O to its left two units along the X axis and then up from 0 four units on the Y axis. Where the perpendiculars to these two points intersect, we locate point ‘B’ (see Figure 2.1). Point ‘C’, whose coordinates are x=-3 and y=-S, is plotted by going left from 0 three units along the X axis and then down from 0 five units along the Y axis. We locate point ‘C’ in the place where the perpendiculars to these two points intersect (see Figure 2.1). In the same manner, point ‘D’, whose coordinates are x=3 and y=-4, is plotted by going right from 0 three units along the X axis and then down from 0 four units on the Y axis, as shown in Figure 2.1. The distance of a point from 0 on the X axis is commonly called the ‘Abscissa’ and the dis- tance of a point from 0 on the Yaxis is called the ‘ordinate’. For instance, the abscissa of point ‘A’ (see Figure 2.1) is +4 and the ordinate is +3. In addition to the earlier mentioned general principles regarding both the axes (X and Y), there are some other general rules for the construction of the graph, which are discussed betow: (a) The Xaxis must run through zero and the Yaxis must begin at zero as the origin. This zero is the base, and the curve is to be interpreted in terms of the distance on the baseline. Ordinarily, the curve and the axis lines are drawn heavier than the rest of the graph. (b) The relationship between the magnitude and distance should be properly maintained in the graph. The equal distance means equal magnitude. The scale should be chosen carefully. The scale should accommodate the whole data. In this connection, it may be noted that, though both X and Y axes start from zero, it is not always possible to place all the points in either axis by starting the graph from zero value. Say, for example, in a test of the trial-and-error method of learning by mirror drawing apparatus only five trials were taken. The number of errors committed by the subject in these trials ranges from 7 to 11. It is, therefore, not necessary to show errors from 1 to lion the Yaxis; rather by giving a break in the Yaxis with a symbol = or =, the amount of errors I 54 STATISTICS FOR BEHAVIOURAL AND SOCIAL SCIENCES is will gi isual impression of the graph. Such break, beshown from 5 to 1. This will givea good visual imp ' can be given in both the axes (X axis {j, | or Ay; Yaxis =, = or 4/), whenever necessary, but it can only be given at the beginning or origin, adjacent to zero, but not in the middle, main purpose in choosing the scale is to permit the whole data to be eal the size of the graph, so far as the ratio between the two axes is concerned, is an important factor. There is no rigid tule regarding the ratio of the axes, but conventionally the ratio of height (Y axis) to width (X axis) may vary between, 60 per cent and 80 per cent. In a well-balanced or good proportionate graph, the height of the graph is 75 per cent of the width. This is known as ‘75% rule’ In practice, it is not always possible to maintain the 75 per cent rule, but in no case it should be below 60 per cent and above 80 per cent. ) The graph must have the labels such as the head Note, footnote or title. As a Matter of formal requirement, the scale caption (ie, the label of a scale) of the X axis is to be placed at the centre of the horizontal axis and that of the Yaxis to be placed on the left side of the Centre of the vertical axis, preferably in bold capital letters. It is also customary to place the 8raph caption or title at the bottom of the graph. (O Although there is no rigid rule as to which axis should Tepresent the dependent or inde- Pendent variable, yet there is a convention that the independent variable is shown in the Xaxis and the dependent variable in the Yaxis, (G) Attimes, itis necessary to draw two curves on the same axis, When it becomes necessary to draw two curves on the same base and on the same scale but with different values, the © different curves are to be drawn either in two different colours or in two different Ways such as one may be in continuous lines and the other in dotted lines, so that visual differentiation will be easier. ( a « a 2.4.2 The Frequency Polygon is very often presented graphically as a Frequency polygon. Polygon means, ‘many-angled figure’. The frequency polygon exposes the characteristics of the distribution more easily and clearly, and makes it more convenient to compare the frequency distributions of more than one sample. The frequency polygon is constructed by plotting a linear graph of frequencies of different class intervals against the midpoints of the respective intervals. Itis an area diagram of a com. tinuous frequency distribution; the area enclosed by the polygon represents the total number (2) of cases or scores of the sample, and the jagged surface of the polygon visually shows how the frequency changes from class to class in the sample. For example, Table 24 represents the frequency distribution of examination scores of 70 students. The procedure in plotting a frequency distribution is as follows: (2) The scores of the variable are scaled along the X axis or the abscissa. Infact, the midpoints (Xo) of the class intervals are marked along the X axis. Besides the X, scores of the class intervals containing the observed frequencies of data, those of two additional class intervals with zero frequency are also entered on the X axis: one for the class interval just below the lowest class interval (eg, 45-49) containing observed frequencies and the other for the class interval just above the highest class interval (eg, 95-99) containing observed frequencies. These two additional class intervals, each having zero frequency, would enable the outline of the frequency polygon to reach or meet the baseline or zero fre- quency level at both ends and would, thus, close up the area of the frequency polygon. For instance, the midpoints of 42 and 102 have been included in the X axis for two additional Chapter2 Presentation of Data able 2.4 Scores achieved by 70 students on statistics examination 55 Class Interval Frequency 95-99 ; 4 90-94 7 6 85-89 80-84 7 75-79 ; 70-74 65-69 class intervals with zero frequencies, namely, 40-44 and 100-104, over and above the mid- points of the class intervals containing the score trequencies (see | able 2.5 and Figure 22). (b) The score frequencies are scaled along the Y axis or the ordinate. The scales for scores and frequencies should be so chosen that the ordinate (axis) at the peak of the polygon meas- ures 75 per cent or at least 60-80% of its base or abscissa (X axis). The scales for both the X and Yaxes should start from zero. However, if the lowest midpoint entered in the X axis is greater than zero, the X axis may be breached or interrupted by a zigzag line ()) to bring the lowest midpoint and the polygon closer to the Y axis (see Figure 2.2). «c The frequency (f) or the observed frequency ((,) of each class interval of the frequency distribution is then plotted against the midpoint of the corresponding class interval, including the two additional empty class intervals whose midpoints (Xo) have been entered in the X axis Evidently, the points for those two empty intervals would lie exactly on the X axis itself since they contain zero frequency. (d) All the plotted points are then joined together by straight lines to complete the process of drawing or graphing the frequency polygon. Steps to be followed in constructing a frequency polygon may be summarised as follows: (Draw two straight lines perpendicular to each other: the vertical line near the left side of the paper an d the horizontal line bearing the bottom. Label the vertical line (the Yaxis) OY and the horizontal line (the X axis) OX. Put 0 where the two lines intersect. This point Ois the origin. (ii) Lay off the midpoints (X, of the score intervals of the frequency distribution at regular distances along the X axis. Begin with the midpoint of the class interval next below the lowest interval in the distribution and end with the midpoint of the class interval next above the highest interval in the distribution, Label the su midpoints (X, )of the class intervals. intervals to be represented easily on the graph paper. sive X distances with the lect an X unit which will allow all of the X- of the 56 STATISTICS FOR BEHAVIOURAL AND SOCIAL SCIENCES Table 25 Distribution of observed frequencies (f, ) and smoothed frequencies (f,)of statistics examination scores given in Table 2.4 “Class Intervals Midpoints (x,) fe fi - 100-104 102 0 404044) 33=13 95-99 7 4 | 44e046)9333=33 90-94 92 6 |. 4(6+4+7)-567=57 85-89 87 7 (7464 10)=767=77 80-84 82. 10 4(00+7+16)=11.00= 75-79 7 16 % (16+10+9) .6; 70-74 72 9 % (9+16+7)=10.67: 107 65-69 67 7 “(7+9+4)=6.67=67 60-64 62 4 %4(4+7+4)=5.00=5.0 55-59 S7 4 %4(4+4+2)=3.33=33 50-54 52 2 Y2+4+1)=2.33=23 45-49 7 in 4 (1+2+0)=1.00=10 40-44 42 0 % (0+1+0)=0.33=03 N=70 - - N=70.0 Git) Mark off on the Yaxis successive units to represent the frequencies on the midpoints of different intervals. Choose a Y scale which will make the largest frequency (the height) of the polygon approximately 75 per cent of the width of the figure. (iv) At the midpoint of each class interval on the X axis, go up in the Y direction a distance equal to the number of scores (frequencies) on the corresponding interval. Place points at these locations. (¥) Join these plotted points with straight lines to obtain the surface of the frequency polygon. 2.4.2.1 Smoothing the Frequency Polygon The frequency polygon for small samples is more jagged and more irregular in shape than the frequency polygon for large samples. In other words, the frequency polygon is relatively less jagged and more regular in appearance for large samples. The smaller the sample, the more the irregular jaggedness of the polygon, Therefore, to iron out chance irregularities, and also to give the polygon for a small sample, a less jagged and more regular shape like that expected of a large sample, the frequency polygon may be smoothed as shown in Figure 22; the observed frequency (f,) of each class interval of a small sample may need adjustment or ‘smoothing’. Smoothing makes the data more numerous. In smoothing, a series of ‘moving’ or ‘running’ averages are taken from which new or adjusted frequencies are determined. The smoothed frequency polygon is plotted in the following manner: 2° Presentation of Data 57 Frequencies V) zt ” 7 \5 a eas — eames e7 RT eM we T we Midpoints of class intervals (X.) ragere 22 Orginal and smoothed frequency polygons of the distibution of sores given in Table: 2) The smoothed frequency () of each class interval may be computed from the observed sequencies(f,) ofa given class and of those two immediately above and below the relevant das interval. In other words, to find an adjusted or smoothed frequency of a particular assinterval, we add the fon a given interval and the f, on the two adjacent intervals (the interval just above and the interval just below) and divide the sum by 3, as shown in the following: fof the relevant class interval) ~ (fof the next higher class interval) ~ (fof the next lower class interval) For example, the smoothed frequency (f,) for the class interval 75-79 (see Table 25) is 16-10-9/3=1167=117. The smoothed frequencies are also calculated for the two extreme additional class intervals containing zero observed frequencies. Note that if we omit these last two additional class ‘stervals, Nfor the smoothed frequency distribution will be less than 70, as the smoothed Gstribution has frequencies outside the range of the original distribution. Thus, the ‘smoothed frequencies (f,) for all the class intervals are computed and shown in Table 2.5. © The computed smoothed or adjusted frequencies are then plotted against the midpoints abe Tespective class intervals. Plotted points are then joined by straight lines (dotted or broken lines) to give the “moothed frequency polygon (see Figure 22), However, unlike the original frequency Polygon, the smoothed polygon does not reach or meet the Xaxis at the two extreme ends. a 58 STATISTICS FOR BEHAVIOURAL AND SOCIAL SCIENCES, IF the already smoothed {in Figure 2.2 are subjected to a second smoothing, the Outline of the frequency surface will become more nearly a continuous flowing curve. It is douby ful, however, whether so much adjustment of the original frequencies is often warranted, When an investigator presents only the smoothed frequency polygon and d loes not give the {requency polygon containing his/her original data itis impossible fora reader to tell with what he/she started. Moreover, smoothing gives a picture of what an investigator might have obtained (not what he/she did obtain) if his/her data had been more numerous or less subject to error than they were. Thus, while plotting thesmoothed frequency polygon, the original frequency poly. §0n containing the observed frequencies should be plotted simultaneously on the same axes against the same midpoints of the class intervals as shown in Figure 2.2. If Nis large, smoothing may not is often unnecessary. Probably, the Sreatly change the shape of the graph, and, hence, best course for the beginner to follow is to smooth data as little as possible. When smoothin, Seems to be indicated in order better to bring out the facts, one should always be careful to Present original data along with ‘smoothed or ‘adjusted’ results. 24.2.2 Plotting Two Frequency Distributions on the Same Axes When Samples Differ in Size Because more than one frequency polygons may be overlapped ot superimposed without Much confusion or clumsiness, frequency polygonsare very useful in comparing the frequency distributions of a variable in several sampies, They also give a g00d visual idea about the con- tours of the distribution. But the uneven or jagged surface of the polygon fails to portray pre- {ely the proportionate frequencies in the class intervals because the area of the polygon between the ordinates at the two limits of an interval is hardly proportional to the frequency in the latter. Table 2.6 gives the distributions of Scores on an achievement test made by two. groups, Aand B, which differ considerably in size. Group A has 60 cases (N=60) and Group B has 160 Table 26 Achievement test scores of two groups A and B | Scores | Midpoints GroupA | GroupB| GroupA —_GroupB in fy Percentage | Percentage | Frequencies | Frequencies 80-89 | BAS 0 9 00 56 70-79 745, 2 50 ao 0-69 6S 1 | 2 167 200 faa ; 16 48 | 67 300 | 2 | 20 | 70 9 2 | 0 as el ae nae e100 eres anes jaca 7 0 67 _00 ie 60 160 100 1001 Chapter2 Presentation of Data 59 8 5 3S 8 a 3 6 Percentage frequencies 6 / 0.0 Lye 45 145 24.5 34.5 44.5 54.5 64.5 74.5 84.5 94.5, Midpoints of class intervals Figure 2.3. Frequency polygons of the two distributions in Table 2.6 scores are laid off on the X axis and percentage frequencies on the Yaxis cases (N=160). If the two distributions in Table 2.6 are plotted as polygons on the same coor- dinate axes, the fact that the frequencies (f's) of Group B are so much larger than those of Group A makes it hard to compare directly the range and quality of achievement in the two groups. A useful device in cases where the N's differ in size is to express both distributions in terms of percentage frequencies as shown in Table 2.6. Now, both N's are 100, and the f’s are comparable from interval to interval. Frequency polygons representing the two distributions, in which percentage frequencies instead of original f’s have been plotted on the same axes, are shown in Figure 2.3. These polygons provide an immediate comparison of the relative achievement of our two groups. This type of comparison cannot be made by polygons plotted from original frequencies that differ in size. Percentage frequencies are readily found out by two ways: first, by dividing each fby N, and then multiplying it by 100 (.e,, f7N«100); and second, by dividing 100 by N, and then mul- tiplying it by each fiie, 100/Nx f). For example, if N=60, f=3, then percentage frequencies can be calculated as 3/60 100=5.0 or 100/60 x3=5.0. What percentage frequencies do, in effect, is to scale each distribution down to the same total N of 100, thus permitting a comparison of F's for each interval. 2.4.3. The Histogram or Column Diagram Another way of representing a frequency distribution graphically is by means of a histogram or column diagram. In other words, a histogram or a column diagram is a graphical represen- tation of the frequency distribution of a continuous quantitative variable. tis an area diagram of a continuous frequency distribution and consists of a continuous set of bars; consecutive bars are not separated by any intervening space, thus indicating that there is no real gap in the scale of scores of the relevant variable, The total area of the histogram represents the total number (1) of cases in the sample, while the area of each bar is proportional to the frequency of cases in a particular class interval. In the frequency polygon, all of the cases within a given class interval are shown to be located at its midpoint only, whereas ina histogram, the cases in. a class interval are shown to be uniformly distributed over the entire length of the interval. 60 STATISTICS FOR BEHAVIOURAL AND SOCIAL SCIENCES Y 25 Y ZL, //, 40.5 45.5 50.5 55.5 605 65.5 70.5 75.5 Scores (X) Figure 2.4 Histogram of a frequency distribution of body weights given in Table 27 Table 27 Frequency distribution of body weights in a sample Class Intervals Frequencies (f) Score Limits True Limits 71-78 705-75. 2 66-70 655-705 | 61-65 605-655 15 56-60 555-605 25 51-55 505-555 v7 46-50 455-505 9 41-45 405-455 4 Within each interval of a histogram, the frequency is shown by a rectangle, the base of which is the length of the class interval, and the height of which is the number of cases within the interval. This spares the histogram from a jagged appearance like the frequency polygon and makes it more convenient for a precise visual portrayal of the proportionate frequencies in the class intervals. The outline of the upper surfaces of the bars thus gives an approximate visual idea of the shape of the frequency curve. However, in comparing the frequency distributions of more than one sample, the his: togram is less convenient than the polygons because as many separate histograms have to Chapter2 Presentation of Data 61 be used as the number of samples, and they cannot be superimposed on each other without confusion or clumsiness. The following steps are to be followed in drawing a histogram using the frequency distribution of a sample. (a) Inorder to avoid the intervening gaps between consecutive class intervals, true class limits have to be used in plotting the histogram. So, the true class limits of all the class intervals are first computed from their respective score limits. Unlike the frequency polygon, no additional class interval with zero frequency is included below or above the class intervals bearing observed frequencies. (b) The scores (X) of the variable studied are scaled along the abscissa or X axis on a graph paper—the true class limits of all the class intervals of the frequency distribution are marked on the X axis. If the lower true limits of the lowest class interval are far higher than 0, the X axis may be breached or interrupted by a zigzag line between 0 and that true limit to bring the latter as well as the histogram closer to the Y axis (see Figure 2.4). (©) The frequencies (f) are scaled along the ordinate or Yaxis. The scales for X and fshould be so chosen that the ordinate for the highest frequency, that is, the height of the tallest bar measures 60 per cent-80 per cent of the base of the histogram or should follow the 75 per cent rule as discussed earlier. (d) Two ordinates are raised on the X axis at the true limits of each class interval, and the top end of the rectangle being formed is closed by a horizontal line at the level of the frequency (f) of that class interval indicated by the Yaxis scale. A bar is, thus, formed with its base extending over the length of the class interval and its height corresponding to the frequency of cases in that interval. This is repeated for all the class intervals in the data to draw a set of bars. This type of graph is illustrated in Figure 2.4 basing on the data given in Table 27. As long as the class intervals and so, the bases of the bars are of equal lengths, the areas of the bars are proportional to their heights and so, to the frequencies in the respective intervals. When the ‘same number of frequencies/scores is found on two or more adjacent intervals, the rectangles are of the same height. The highest rectangle is of the interval 55.5-60.5, which has 25, the larg- est frequency, as its height (see Figure 2.4). The bars, however, differ in the width of their bases if the class intervals are of unequal sizes. In such cases, the areas of the bars fail to give the idea about the proportional frequencies in the class intervals. To remedy this, the frequency (f) of each interval is divided by the class size (i) of that interval to give the frequency density which is the average frequency per unit length of interval, namely, /7i. Each bar is then drawn with its height equalling the frequency density ffi and its base coinciding with the original class size (i) of that interval. The areas of these bars are proportional to the frequencies in the respective class intervals of unequal class sizes (see Figure 2.5) ‘Ahistogram with unequal class intervals is illustrated in Figure 2.5 basing on the data given in Table 2.8. 2.4.3.1 Plotting a Histogram and a Frequency Polygon of a Frequency Distribution on the Same Axes Scores achieved by 50 students on a statistics test are given in Table 2.9, which serves as data for plotting both histogram and frequency polygon (see Figure 2.6). In plotting a histogram and a frequency polygon for a particular frequency distribution (ie, data given in Table 2.9) on the same axes, the same rules as followed in drawing individual histograms or frequency polygons are followed. 62 STATISTICS FOR BEHAVIOURAL AND SOCIAL SCIENCES Y 5 WA Y - Uf = 4 i 3 i 2 E 1 0 | f _ 150.5 160.5 170.5 183.5 165.5 175.5 Scores (x) Figure 25 Histogram of a distribution of frequency densities with unequal class intervals given in Table 28 Table 28 Distribution of frequency densities in unequal class intervals Class Intervals Size of Interval | Frequencies | f/i Score Limits | True Limits AM Ke 176-183 1755-1835 8 4 as 171-175, 1705-1755 5 10 20 166-170 1655-1705 * 20 40 161-165 160.5-165.5 s 25 50 151-160 150.5-160.5 10 5 as 64 Since each class interval in a histogra sary to project the sides of the rectangl im is represented by a separate rectangle, itis not neces- les down to the baselineas has been done in Figure 2.5, The rise and fall of the boundary line shows the increase or decrease in the number of scores/cases from interval to interval and is usually the important fact to be brought out see Figure 26) As in a frequency polygon, the total frequency (ris represented by the area ofthe histogram. In contrast to the frequency polygon, however, the area of each rectangle ina histogram is directly Proportional to the number of cases within the interval. For this reason, the histogram presents an accurate picture of the relative proportions of the total frequency from interval to interval In order to provide a more detailed comparison of the two types of frequency graphs the distribution in Table 2.9 is plotted upon the same coordinate axes in Figure 2.6 as a frequency polygon and as a histogram. Chapter2 Presentation of Data 63 49.5 54.5 59.5 64.5 69.5 74.5 79.5 84.5 89.5 94.5 99.5 Scores Figure26 Frequency polygon and histogram of $0 statistics test scores asshown in Table 29 Table 2.9 Scores achieved by 50 students on a statistics test (Class Intervals (scores) Midpoints True Limits Frequencies 95-99 7 94.5-99.5 90-94 92 89.5-94.5 4 85-89 87, 84.5-89.5 6 80-84 82 795-845 10 78-79 7 745-795, 10 70-74 72, 695-745 2 65-69 o7 645-695 4 60-64 62 595-645 7 55-59 57. 545-595 2 50-54 52 49.S-S4.S 0 SO 2.4.3.2 Difference Between Frequency Polygon and Histogram Although both frequency polygon and histogram are used for the graphical representation of frequency distribution, and are alike in many respects, yet they possess some points of difference. The following are some of these differences. (i)_A frequency polygon is a line graph of a given frequency distribution, whereas a histo- gram is essentially a bar graph or a column diagram of the same frequency distribution. 64 Gii) (iv) (vi) «wip STATISTICS FOR BEHAVIOURAL AND SOCIAL SCIENCES. (ii) Ina frequency polygon, all the frequencies in a class interval are assumed to be concen- trated or located at its midpoint only. Thus, it merely points out the graphical relation- ship between the midpoint and the frequencies of a class interval, and is unable to show the distribution of frequencies within each class interval. But in a histogram, the frequency of each class interval is shown to be uniformly distributed over the entire length of the class interval, and the area of each rectangle in a histogram is directly Proportional to the number of cases within the interval. For this reason, a histogram gives a very clear as well as accurate picture of the relative proportions of the total frequency from interval to interval. A mere glimpse of the histogram makes us able to know which class interval has the largest or smallest frequency, which pair of class intervals has the same frequency and so on. A frequency polygon gives a jagged appearance, whereas a histogram is more conveni- ent for a precise visual portrayal of the proportionate frequencies in the class interval. The outline of the upper surface of the bars, thus, gives an approximate visual idea of the shape of the frequency curve. A frequency polygon is less precise than a histogram in that it does not represent the frequency upon each class interval accurately. In comparing two or more frequency distributions by plotting two or more graphs on the same axes, however, a frequency polygon is likely to be more useful and practicable than a histogram as the vertical and horizontal lines in the histograms will often coincide. In comparison to a histogram, a frequency polygon gives a much better conception of the contours of the frequency distribution. With a part of the polygon curve, it is easy to know the trend of the distribution, but a histogram is unable to tell us about it. Both the frequency polygon and the histogram tell the same story and enable us to show how the scores in the group are distributed in a graphical form—whether they are piled up at the low or high end of the scale or are evenly and regularly distributed over the scale. If the test is too easy, scores accumulate at the high end of the scale, whereas if the test is too tough, scores will crowd the low end of the scale. When the test is well suited to the abilities of the group, scores will be distributed symmetrically around the mean—a few individuals scoring quite high, a few quite low and the major- ity falling somewhere near the middle of the scale. When this happens, the frequency polygon approximates to the idea of the normal frequency curve. In this situation, the use of a frequency polygon is preferred to that of a histogram. (vy) 2.4.4 The Cumulative Frequency Graph The cumulative frequency graph is another way of representing a frequency distribution by means of a diagram. It is essentially a line graph drawn on a graph paper by plotting actual upper limits of the class intervals on the X axis and the cumulative frequencies of these class intervals on the Yaxis. Let us take the data given in the cumulative frequency distribution in Table 2.10 to explain the process of the construction of a cumulative frequency graph (see Figure 27). A cumulative frequency graph is a graphical representation of the distribution of cumula- tive frequencies in the class interval of the sample. It is drawn in the following manner, using the frequency distribution given in Table 2.10: (a) The actual or true upper limits (X,) of all the class intervals in the frequency distribution are computed and entered against the respective intervals (see Table 2.10). The X, is computed as follows: X,=% (upper score limit of the given interval + lower score limit of the next higher interval) Chapter2 Presentation of Data 65 Table 2.10 Cumulative frequency distribution of reading scores in a sample Class Interval | True Upper Limits | Frequency (f) Cumulative Frequency (cf) 95-99 995 4 70 90-94 945 6 6 85-89 895 7 60 80-84 845 10 3 75-79 795 16 B 70-74 745 7 65-69 695 18 Y 70 60 z g 0 5 $ 40 & é 9 0 & z 20 5 é 10 ors +#—+—_+— X 44.5 49.5 54.5 59.5 64.5 695 74.5 79.5 845 89.5 945 99.5 Scores (True upper limits of class intervals) Figure 27 Cumulative frequency graph for the data given in Table 2.10 66 STATISTICS FOR BEHAVIOURAL AND SOCIAL SCIENCES (b) The cumulative frequencies (cf) of all the class intervals are determined against the respective class intervals. The cumulative frequency (cf) of a clas and Written the sum of the frequencies ( f) of all the k number of intervals from the lows ss interval is est interval t¢ the upper limit of kth interval under consideration, Where cf, cf fy... Cf, are the ‘cumu. lative frequencies and f;, fy fo ++) frate the observed frequencies of the successive class intervals of the frequency distributio n in an ascending order of the intervals: Ha ACh= hth Ch=fith+hi l= ft ht fe the Another way to determine the cumulative frequencies of the class intervals in ascend. ing order is the following: Ch=fi ch=ch+fy ch=ck+fy chi=ch + fien To illustrate (see Table 2.10), the first cf is 1; 1+2, from the low end of the distribution, Sives 3 as the ni ; i See cnty; 34427; 7+4=11; 11+7=18; 18+9=27 and so on. The last cumula, tive frequency is equal 1070 or n, the total frequency. (©) The scores (x) are scaled along the X axis, marking the true upper limits of all the Class intervals on it. In additi e frequency (cf) of each class timit. This is because in adding, progress frequency carries through to the exact upper ait of the interval. The first point on the Curve is one Y unit (the cf on : the 45-49 class interval) above 49.5; the second point is 3 Y units above $4.5; the third point is7 Y units above 59: Sand so on to the last point which is 70 Y units above 99.5 (see Table 2.10). (e) The plotted points are joined by » Straight lines to give the cumulative frequency graph which is S-shaped. In order to have the e Curve begin on the X axis, it is started at 44.5 (the true upper limit of the class interval 40-44 or the true lower limit of the class interval 45-49 given in Table 2.10), the cf of which is 0 (see Figure 27), An important difference between a frequency polygon and a cumulative frequency curve is that in @ frequency polygon the frequency on each class interval is plotted against its mid- point, whereas in a cumulative freq juency curve the cumulative frequency on each class inter, val is plotted against its true upper limit. interval is then plotted against its true upper Ively from the bottom up each cumulative 2.4.5 The Cumulative Percentage Curve or Ogive ercel tant graphical representation of the € Cumulative percentage curve or Ogive is an import ainastie percentage frequency distribution of a continuous variable, Itis essentially a line aph drawn on a piece of graph paper by plotting the true upper limits of the class intervat én the X axis and their cumulative percentage frequencies on the Yaxis. : this way, the Ogive Sf ense that in an Ogive the cumulative cumulative frequency graph in the sense : perce Chapter2 Presentation of Data 67 Table 2.11 Cumulative percentage frequency distribution of reading scores in a sample Class_| True Upper | Frequency | Cumulative | Cumulative Percentage Interval Limit Pn Frequency (cf) Frequency (cP) 75-79 795 1 125 1000 70-74 745, 3 124 99.2 65-69 695 6 121 968 60-64 64.5 12 us 92.0 55-59 595 20 103 824 50-54 545 36 83 66.4 45-49 495 20 47 376 40-44 445 1s 27 216 35-39 395 6 2 96 30-34 345 4 6 48 25-29 298 2 2 16 The cumulative percentage frequency (cP) of a class interval is its cumulative frequency (cf expressed as a percentage of the total frequency (n) of a sample: cp=“£,100. n A cumulative percentage distribution is a tabulated form of the cumulative percentage frequencies according to the class interval of the scores in the grouped data of a sample (see Table 2.11), The cumulative percentage curve or cumulative percentage Ogive is the graphic expression of the cumulative percentage frequency in the class intervals of a sample. The process of construction of an Ogive may be better understood through the following graphical representation (see Figure 2.8) based on the data given in Table 2.11. The cumulative percentage (cP) curve or Ogive is drawn in the following way from the Classified data of a sample given in Table 2.11: (a) The actual or true upper limits (X,) of all the class intervals in the frequency distribution are computed and entered against the respective intervals. The computation of the true upper limit (X,) of class intervals is done in the following, manner: X,=% (upper score limit of the given interval +lower score limit of the next higher interval) (b) The cumulative frequencies (cf) of all the class intervals are computed and tabulated, start- ing from the lowest class interval or from the low end of the distribution upward as described before by the drawing of the cumulative frequency. For example, the cf for the kth interval will be determined as: 68 STATISTICS FOR BEHAVIOURAL AND SOCIAL SCIENCES 1000" | so} ae 800 700 600 50.0 / 40.0 J 30.0 200 100 00 b je}, 4, x 245 29.5 345 39.5 445 495 54.5 59.5 64.5 695 74.5 79.5 Cumulative percentages (cPs) ‘Scores : True upper limits of class intervals Figure 28 Cumulative percentage curve or Ogive plotted from the data of Table 2:11 Gh =f+h+ tet fy orcly=cf,..+ fi, and then it will be entered against the respective class interval. (©) The cumulative percentage freq luency (cP) of each class interval is then computed and tabulated, using the sample si ze (n) and the cf of that interval (see Table 2.11) eP=“,100, n Another method of computing cP of each class interval is as follows: P=) ccf 100, In this method, first we have to determin tiply each cf in order b 100 in order to obtain e the reciprocal, 1/1, called the rate, and mul- y this fraction, and then the resultant amount will be multiplied by the percentage. For example, as per the data given in Table 2.11, the Fate is 1/n=1/125=0.008. Hence, multiplying 2 (the cf of the class interval 25-29 in Table 2.11) by 0.008, we obtain 0.016, which is then multiplied by 100 and we obtain 16 percent. Similarly, 6 «0.008 =0.048 «100=4.8 per cent; 12x 0,008=0.096 x100=9.6 per cent and soon. ‘The cumulative percentage distribution is, thus, obtained, and the last cumulative percentage frequency is equal to 100, which means that nis converted to 100, (d) The scores (X) of the variable are scaled along the Xaxis on a piece of graph paper, marking the true upper limits of all the class intervals on that line. In addition, the true lower limit of the lowest class interval is also marked on the X axis as the true upper limit of the next Jower interval with a cP of 0. From the given Table 2.11, it is quite evident that the lowest class interval is 25-29, whose true lower limit is 24. The cumulative percentage frequen- cies (cP)are scaled along the Y axis of the curve (see Figure 28), The size of the graph should follow the 75% rule’. Chapter2 Presentation of Data 69. (e) The cumulative percentage (cP) frequencies of each class interval are then plotted against its true upper limit (see Figure 2.8). The first point on the Ogive is placed 16 Y units (ie, the Pon the 25-29 class interval) just above 29.5; the second point is 4.8 Y units just above 345 and so on. The last point is 100 Y units above 79.5, the exact or true upper limit of the highest class interval (see Table 2.11), (p The plotted points are joined by straight lines to complete the cumulative percentage curve or Ogive. In order to have the curve begin on the X axis, it is started at 24.5 (the true upper limit of the class interval 20-24 or the true lower limit of the class interval 25-29 given in Table 2.11), the cP of which is 0 see Figure 2.8). One important difference between the cumulative frequency (cf) curve and the cumula- tive percentage (cP) curve is that in the cf curve the cfs are plotted against the true upper limits of the respective class intervals, whereas in the cP curve or Ogive cPs are plotted against the true upper limits of the respective class intervals. 245.1 Smoothing of the Ogive One question may be asked as to why do we need smoothing of a frequency polygon or an Ogive. The appropriate answer to this question is that many times the frequency curves obtained from frequency distributions are so irregular and disproportionate that it becomes quite difficult to give some useful interpretation to them. It usually happens in the situation where the total number of frequencies () in a sample is small and the frequency distribution is somewhat irregular. These irregularities in frequency distributions and the effects of sam- pling fluctuations on the frequencies in different class intervals can be minimised by a fairly large sample. An increase in the sample size (7m) always results in reducing or eliminating sample irregularities or chance irregularities. However, it is not always possible on the part of the researcher to increase the sample size owing to his/her limitations with regard to time, money and labour. In these situations, the kinds and irregularities in the frequency curves may be removed through the process of smoothing the curve. Thus, to iron out chance irregularities and also to obtain a better notion of how the figure of the curve might look if the data were more numerous, obtained from a fairly large sample, the curve needs to be smoothed. Therefore, the Ogive is to be smoothed in order to iron out minor kinds and irregularities in the curve. Owing to the smoothing process, the Ogive is more regular and continuous than the original Ogive (see Figure 2.9). ‘The process of construction and smoothing of an Ogive may be better understood through the following graphical representation (see Figure 2.9) based on the data given in Table 2.12. The smoothed cP Ogive is drawn in the following way from the classified data of a sample given in Table 2.12. (a) The class intervals containing the frequencies (f) as well as two additional class intervals, located, respectively, just below and above that range of intervals, and each having zero frequency, are tabulated along with their respective frequencies. The true upper limits (X,) of all the class intervals are then computed and entered against the respective intervals. The computation of X, of class intervals will be done by follow- ing the same formula as described earlier by the drawing of cumulative frequency curves or cumulative percentage frequency curves. ‘The cumulative frequencies (cf) of all the class intervals are then computed and tabulated, starting from the lowest class interval, and the computed cfs are entered against the res- pective intervals. The procedure for computing the cfs of all class intervals is the same as described earlier while drawing the cumulative frequency graph or the cumulative percentage frequency curve (b: « 70 STATISTICS FOR BEHAVIOURAL AND SOCIAL SCIENCES Y 100.0 800 800 70.0 60.0 50.0 400 30.0 Smoothed cumulative Percentage frequencies (ScP) 200 100 00 by x 475 805 535 565 595 625 655 685 715 745 Figure 29 Smoothed cumulative ‘Scores: True upper limits of class intervals percentage frequency (ScP) curve or Ogive of the scores in Table 212 Table 2.12 Distribution of cumulative percentages and smoothed cumulative percentages of an arithmetic reasoning test scores in a sample Class Intervals Xe f of cP__| Smoothed cP 72-74 748 o | 80 1000 1000 69-71 71S 3 80 1000 988 66-68 68.5 8 77 96.3 942 63-65, 655, 15 69 863 834 60-62 625 2B 54 ors 621 57-59 595 4 26 325 383 54-56 S65 ¥ 2 15.0 79 51-53 535 5 5 63 721 48-50 S05 0 0 0.0 24 (d) The cumulative percentage frequency (cP) of each interval is then computed, tabulated and entered against the respective class intervals. The procedure for computing the cP of all class intervals is the same as described earlier while drawing the cP curve or Ogive. (e) The only difference between the process of smoothing an Ogive and smoothing a fre- quency polygon is that we average cumulative percentage frequencies in the Ogive instead Chapter2 Presentation of Data 71 of the actual frequencies. The smoothed cumulative Percentage (cP) frequency is com- puted for each class interval, using the cP of that interval and those of the two intervals just below and above of the given interval. The smoothed cP of a particular class interval can be computed as given below: Smoothed cP="s|(cP of the given interval) +(cP of the next lower interval)+(cP of the next higher interval)}. The smoothed cumulative percentage frequencies (ScP) are given in Table 2.12. For example {the smoothed cP to be plotted against 535 (the X, of the 51-53 interval given in Table 2.12) is 63+00+150 _ 213 _ 3 71. Care must be taken at the extremes of the distribu different. With reference to Table 2.12, for example, frequency at 50.5 is ition where the procedure is slightly the smoothed cumulative percentage 9020063 =2land at 745 1s 1000+ 1000+ 1000 =1000 This is because of the fact that the cP of the class interval just below the class interval of 48-50 is 0.0, and the cP of the class interval just above the class interval of 72-74 is 100.0. Note that the smoothed Ogive extends one class interval beyond the original at both ‘extremes of the distribution. (®) Thescores (X) of the variable are scaled along the Xaxis on a piece of graph paper, marking the true upper limits of all the class intervals, including the two additional class intervals, on that line. In addition, the true lower limit of the lowest class interval (in this case, the additional class interval in the low end of the original distribution) is also marked on the Xaxisas the true upper limit of the next lower interval with a smoothed cP of 0. From the given Table 2.12, it is quite evident that the lowest class interval is 48-50, whose true lower limit is 47.5. The smoothed cumulative percentage frequencies (ScP) are scaled along the Y axis of the Ogive (see Figure 2.9). The size of the graph should follow the 75% rule’. (g) The smoothed cumulative percentage frequencies (ScP) of each class interval are then plotted against its true upper limit (see Figure 2.9). The first point on the smoothed Ogive is placed 2.1 Y units just above 50.5, the second point is 7.1 Y units just above 53.5 and so on. The last point is 100 Y units just above 745, the true upper limit of the highest class interval, which is the additional class interval at the high end of the distribution (see Table 2.12). (h) The plotted points are then joined by straight lines to complete the smoothed cumulative percentage curve or smoothed Ogive. In order to have the curve begin on the X axis, it is started at 47.5 (the true upper limit of the class interval 45-47 or the true lower limit of the class interval 48-50, given in Table 2.12), the ScP of which is 0 (see Figure 2.9). There are two important differences between the cumulative percentage frequency curve and the smoothed cumulative percentage frequency curve. These two differences are: (i) in drawing the cP curve, two additional class intervals, located, respectively, just below and just above the range of original class intervals and each having zero frequency, are not needed, whereas in drawing the smoothed cP curve these two additional class intervals are needed, and scaled on the X axis of the graph; and (ii) in the cP curve the cumulative percentage frequen- Gies of the class intervals are plotted against the true upper limits of the respective intervals, Se eet aae 72. STATISTICS FOR BEHAVIOURAL AND SOCIAL SCIENCES whereas in the smoothed cP curve the smoothed cumulative percentage frequencies of the class intervals are plotted against the true upper limits of the respective intervals. 2.4.5.2 Uses of the Cumulative Percentage Frequency Curve or Ogive The following are some of the uses of the cumulative percentage frequency curve or Ogive: (The statistics such as median, quartiles, quartile deviations, deciles, percentiles and percentile ranks may be determined quickly and fairly accurately from the Ogive. (ii) Percentile norms (a type of norm Tepresenting the typical performance of some designated group or groups) may be easily and accurately determined from the Ogive. GiD A useful overall comparison of two or more groupsis provided when Ogivesrepresenting their scores on a given test are plotted upon the same coordinate axes. 2.5 Other Graphic Methods In addition to the four important discussed, there are several other attributable to growth, practice, ungrouped data (data not grou linegraph, the bar diagram or in the following sections. graphic methods of representing data that we have already graphic methods that deal with data showing the changes learning and fatigue and so on. These types of data are mostly ped into a frequency distribution). Widely used devices are the bargraph and the pie diagram or circle graph. These are illustrated 2.5.1 The Line Graph The line graph is a simple mathematical graph that is drawn on the graph paper by plotting the independent variable on the horizontal or X axis and the dependent variable on the verti cal or Y axis. With the help of such graphs the effect of one variable upon another variable during an experiment or normative study may be clearly demonstrated, The construction of a line graph can be better understood through the following example. Example 2.9 A word-learning task consisting of 20 words was administered on a student of class I to demonstrate the effect of practice on learning, He was administered 10 trials, each word carries 1 score, and the following result was obtained: TrialNo | 1 | 2 [3 [4]5|6|]7]8 | 9 9 10 10 10 13 12 15 Scores 3 4 6 8 In order to draw the line graph on a piece of graph paper for the data of 10 trials mentioned earlier, first of all, the independent variable (trials) is represented on the horizontal or X axis and the dependent variable (scores, ie, the number of words learned) is marked off on the vertical or Yaxis of the graph. Then the scores of the student are plotted against the respective trials in accordance with the measures marked off on the Yaxis. The plotted points are then joined by the straight lines and the lower end of the graph is joined to zero, the meeting point of Xand Yaxes to complete the line graph (see Figure 2.10), a Chapter2 PresentationofData 73 Scores © Trials Figure 2.10 Line graph: The effect of practice on learning 2.5.2 The Bar Diagram Ina bar diagram, data are represented by bars or columns. Generally, these diagrams are drawn on graph papers. Therefore, these bar diagrams are also referred to as bar graphs. A bar diagram consists of one or more sets of bars or columns, used frequently for the graphical representa- tion and comparison of the frequency distribution of particularly attributes and discrete variables. Bar diagrams may be conveniently used for comparing the class-wise frequency dis- tribution of a variable in one or more samples. Bar diagrams are usually available in two forms: vertical and horizontal. While construct- ing both these forms, the lengths of the bars are kept proportional to the amount of variable or trait (Le, height, intelligence, number of individuals, educational achievements and so on) possessed. The width of bars is not governed by any set of rules. It is an arbitrary factor. Similarly, the space between two bars is also arbitrary in size. Bar diagrams are generally of three types: simple bar diagram, multiple bar diagram and proportional bar diagram. These bar diagrams have been discussed as follows: Simple bar diagram. A simple bar diagram consists of a set of several parallel bars or rectangles, one for each group or class of the variable. The bars may be vertical or horizontal, have equal widths chosen arbitrarily by the researcher and do not overlap with each other. The bars are separated from each other by small intervening gaps/space indicating that real gaps exist between the classes of the variables. The intervening gaps are, however, arbitrary in size. Frequencies, amounts or percentages are scaled parallel to the lengths of the bars, starting from a zero value to avoid any misleading impression about the relative lengths of the bars. The length or height of each bar is made to correspond to the frequency, amount or percentage in the relevant class. Because the widths of the bars are of equal size, their areas are directly

You might also like