0% found this document useful (0 votes)
18 views

Concept Notes - Basics of Statistics Summarization Data and Frequency Distribution Lyst7964

This document discusses basics of statistics and summarization of data through frequency distribution. It covers various topics related to collecting and processing data including ways to collect data through surveys using questionnaires, conducting pilot and actual surveys, sampling techniques, processing raw data, and creating frequency distributions. Frequency distributions summarize data by grouping it into intervals and counting the frequency of observations within each interval.

Uploaded by

nisarg
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Concept Notes - Basics of Statistics Summarization Data and Frequency Distribution Lyst7964

This document discusses basics of statistics and summarization of data through frequency distribution. It covers various topics related to collecting and processing data including ways to collect data through surveys using questionnaires, conducting pilot and actual surveys, sampling techniques, processing raw data, and creating frequency distributions. Frequency distributions summarize data by grouping it into intervals and counting the frequency of observations within each interval.

Uploaded by

nisarg
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Basics of Statistics, Summarization of Data and

Frequency Distribution

https://rbigradeb.wordpress.com Page |1 http://www.edutap.co.in


Contents
1 Introduction .......................................................................................................................................... 3
2 Data in Statistics .................................................................................................................................... 3
2.1 Problems with Secondary Data ..................................................................................................... 4
3 Ways to Collect Data? ........................................................................................................................... 5
3.1 Questionnaire in Surveys .............................................................................................................. 5
3.2 Close ended or Open ended Questions in Surveys ....................................................................... 5
3.3 How the Data is collected in Surveys? .......................................................................................... 6
3.4 Conducting the Pilot Survey .......................................................................................................... 8
3.5 Conducting the Actual Survey ....................................................................................................... 8
4 How to do Sampling? ............................................................................................................................ 9
5 Processing the Raw Data..................................................................................................................... 13
6 Frequency Distribution ....................................................................................................................... 13
Note that the class 1 – 10 means, marks obtained from 1 to 10, including both. .......................... 15
6.1 More Examples of Frequency Distribution ................................................................................. 16
7 Some Common Terms ......................................................................................................................... 17
8 Adjusting the class interval ................................................................................................................. 18
9 What is Bivariate Frequency Distribution? ......................................................................................... 19
10 Frequency Diagrams ....................................................................................................................... 20
11 Cumulative Frequency .................................................................................................................... 27
11.1 Cumulative Frequency for Grouped Data ................................................................................... 28
12 Frequency Diagram for Cumulative Frequency .............................................................................. 30
13 Types of Statistics............................................................................................................................ 31
13.1 Descriptive Statistics ................................................................................................................... 32
13.2 Inferential Statistics .................................................................................................................... 32

https://rbigradeb.wordpress.com Page |2 http://www.edutap.co.in


1 Introduction
Every day we come across a wide variety of information in the form of facts, numerical figures,
tables, graphs, etc. These are provided by newspapers, televisions, magazines and other means
of communication. These may relate to cricket batting or bowling averages, profits of a
company, temperatures of cities, expenditures in various sectors of a five year plan, polling
results, and so on. These facts or figures, which are numerical or otherwise, collected with a
definite purpose, are called data

Our world is becoming more and more information oriented. Every part of our lives utilises data
in one form or the other. So, it becomes essential for us to know how to extract meaningful
information from such data. This extraction of meaningful information is studied in a branch of
mathematics called Statistics.

2 Data in Statistics
Whole of the statistics is based on data. Suppose your teacher asks you to find the average
marks in the science subject for the full class. Now before you find the average you would have
to collect data. The data can be collected in two ways

Types f Data

Primary Data Seconday Data

1. Primary Data: If you yourself go and ask each and every student about their marks in
science then it means you are collecting the data yourself. Such a data is called Primary
data. Such data are called Primary Data, as they are based on first hand information

2. Secondary Data: You can also go to teacher and ask for a register in which marks for
science are recorded. The data obtained in such a way would be called secondary data.
If the data have been collected and processed (scrutinised and tabulated) by some other

https://rbigradeb.wordpress.com Page |3 http://www.edutap.co.in


agency, they are called Secondary Data. Similarily if you use data from a research
conducted by someone else than that would be called secondary data

2.1 Problems with Secondary Data


Secondary sources are very helpful in conducting research but there are some problems
associated with the use of these sources. The actual and the most basic issue is always with the
validity and reliability of the source from which the data is taken. Primary sources like
experiments are very reliable and valid as compared to the secondary sources. These problems
can be eliminated to some extent where possible.

1. Validity and reliability: Validity and reliability are very important concerns in research
and they cannot be taken for granted. Some secondary sources are as much reliable as
primary sources like census as it covers the whole population. Other sources might not
be as much reliable and they should only be used when no other data is available. Valid
means that the data represents original and true findings and it has been collected using
scientific methods. While using secondary sources of information it should be well-
researched that the content is genuine and authentic.

2. Personal bias: In secondary sources the chances of bias are higher as compared to that
in primary sources. Some secondary sources like personal records can be highly biased
and they may be not. Personal diaries and other records like newspapers; mass media
products can be biased. Newspapers, magazines and websites do not use rigorous and
well-controlled methods in documentation. Most of the time such writings are opinion-
based and they are far from facts. In these publications writers can distort the facts to
make the situation look better or worse.

3. Format of data: In secondary sources the format of the data should also be seen before
using it in the research. The format of the data can be totally different and the
researcher cannot use it in his research. Using another format in data collection that is
not related to your research format can give biased and invalid results.

4. Quality of data: Quality of the data is related to its accuracy and accuracy comes with
rigorousness in collecting the data. It depends on the source that you are using in your
research; books and journals can provide you quality data. There might be some
secondary sources that cannot provide high quality data. Again newspapers and
magazines cannot provide good data for research, they should be avoided

5. Obsolete data: Sometimes secondary sources are available to be used in the research
but they are very old. Old data is of no use to be used in the research. You cannot use a
book that has been written 20 years back, the data present in that book will be valid and
reliable at the time when it was written but taking the current circumstances it is
obsolete

https://rbigradeb.wordpress.com Page |4 http://www.edutap.co.in


3 Ways to Collect Data?
There are many methods used to collect or obtain data for statistical analysis. Three of the
most popular methods are

1. Direct Observation
2. Experiments
3. Surveys

Surveys are the most important tool and we will discuss about surveys in this document

3.1 Questionnaire in Surveys


The most common type of instrument used in surveys is questionnaire/ interview schedule. The
questionnaire is either self-administered by the respondent or administered by the researcher
(enumerator) or trained investigator. While preparing the questionnaire/interview schedule,
you should keep in mind the following points;

1. The questionnaire should not be too long. The number of questions should be as
minimum as possible. Long questionnaires discourage people to answer

2. The series of questions should move from general to specific.The questionnaire should
start from general questions and proceed to more specific ones. This helps the
respondents feel comfortable. For example:

Poor Q
(i) Is increase in electricity charges justified?
(ii) Is the electricity supply in your locality regular?

Good Q
(i) Is the electricity supply in your locality regular?
(ii) Is increase in electricity charges justified?

3. The question should not be a leading question, which gives a clue about how the
respondent should answer. For example in the below question word high quality is
misleading

Poor Q
How do you like the flavour of this high-quality tea?

Good Q
How do you like the flavour of this tea?

3.2 Close ended or Open ended Questions in Surveys


1. Closed Ended Questions are those which can be answered in yes and no or in a short
phrase.

https://rbigradeb.wordpress.com Page |5 http://www.edutap.co.in


 Advantages: time-efficient; responses are easy to code and interpret; ideal
for quantitative type of research
 Disadvantages: respondents are required to choose a response that does not exactly
reflect their answer; the researcher cannot further explore the meaning of the
responses

Example
Do you think India would win the Match? (Yes or No)

2. In open-ended questions, there are no predefined options or categories included.


The participants should supply their own answers.

 Advantages: participants can respond to the questions exactly as how they would like
to answer them; the researcher can investigate the meaning of the responses; ideal
for qualitative type of research
 Disadvantages: time-consuming; responses are difficult to code and interpret

Example
What is your View on Indian Captain – Virat Kohli’s form?

3.3 How the Data is collected in Surveys?

There are three basic ways of collecting data:

Ways of Collecting Data

Personal Interview Mailing Questionnaire Telephonic Interviews Focus Groups

In-person Interviewing or Personal Interview


When you use this method, you meet with the respondents face to face and ask questions. In-
person interviewing offers several advantages. This technique has excellent response rates and

https://rbigradeb.wordpress.com Page |6 http://www.edutap.co.in


enables you to conduct interviews that take a longer amount of time. Another benefit is you
can ask follow-up questions to responses that are not clear.

In-person interviews do have disadvantages of which you need to be aware. First, this method
is expensive and takes more time because of interviewer training, transport, and remuneration.
A second disadvantage is that some areas of a population, such as neighborhoods prone to
crime, cannot be accessed which may result in bias.

Telephone Interviewing
Using this technique, you call respondents over the phone and interview them. This method
offers the advantage of quickly collecting data, especially when used with computer-assisted
telephone interviewing. Another advantage is that collecting data via telephone is cheaper than
in-person interviewing.

One of the main limitations with telephone interviewing it’s hard to gain the trust of
respondents. Due to this reason, you may not get responses or may introduce bias. Since phone
interviews are generally kept short to reduce the possibility of upsetting respondents, this
method may also limit the amount of data you can collect.

Mailed Questionnaire
When you use this interviewing method, you send a printed questionnaire to the postal address
of the respondent. The participants fill in the questionnaire and mail it back. This interviewing
method gives you the advantage of obtaining information that respondents may be unwilling to
give when interviewing in person.

The main limitation with mailed questionnaires is you are likely to get a low response rate. Keep
in mind that inaccuracy in mailing address, delays or loss of mail could also affect the response
rate. Additionally, mailed questionnaires cannot be used to interview respondents with low
literacy, and you cannot seek clarifications on responses.

Focus Groups
When you use a focus group as a data collection method, you identify a group of 6 to 10 people
with similar characteristics. A moderator then guides a discussion to identify attitudes and

https://rbigradeb.wordpress.com Page |7 http://www.edutap.co.in


experiences of the group. The responses are captured by video recording, voice recording or
writing—this is the data you will analyze to answer your research questions. Focus groups have
the advantage of requiring fewer resources and time as compared to interviewing individuals.
Another advantage is that you can request clarifications to unclear responses.

One disadvantage you face when using focus groups is that the sample selected may not
represent the population accurately. Furthermore, dominant participants can influence the
responses of others.

3.4 Conducting the Pilot Survey


Before actual survey is conducted using any of the above methods among people is advisable to
conduct a pilot survey.

Pilot Survey

Once the questionnaire is ready, it is advisable to conduct a try-out with a small group which is
known as Pilot Survey or Pre-Testing of the questionnaire. The pilot survey helps in providing a
preliminary idea about the survey. It helps in pre-testing of the questionnaire, so as to know the
shortcomings and drawbacks of the questions. Pilot survey also helps in assessing the suitability
of questions, clarity of instructions, performance of enumerators and the cost and time
involved in the actual survey

3.5 Conducting the Actual Survey


The actual survey can be conducted in two ways

1. Getting Response from the entire population


2. Getting Response from only a sample of the Population

https://rbigradeb.wordpress.com Page |8 http://www.edutap.co.in


Suppose you want to study about the popularity of a film star among the people living in
particular state who are between 20-30 years

The population in this example will consist of all the people in the state who are between 20-
30. It would be near to impossible to talk to so many people. So in this case a sample may be
chosen from the population i.e. few people would be chosen to collect the response about the
leader

4 How to do Sampling?
There are two ways to select the sample population.

Sampling Techniques

Random Sampling Non Random Sampling

https://rbigradeb.wordpress.com Page |9 http://www.edutap.co.in


Random Sampling
As the name suggests, random sampling is one where the individual units from the population
(samples) are selected at random. The government wants to determine the impact of the rise in
petrol price on the household budget of a particular locality. For this, a representative (random)
sample of 30 households has to be taken and studied. The names of all the 300 households of
that area are written on pieces of paper and mixed well, and then 30 names to be interviewed
are selected one by one

In the random sampling, every individual has an equal chance of being selected and the
individuals who are selected are just like the ones who are not selected. In the above example,
all the 300 sampling units (also called sampling frame) of the population got an equal chance of
being included in the sample of 30 units and hence the sample, such drawn, is a random
sample. This is also called lottery method. Some examples of Random Sampling Techniques are
given below

Random Sampling Techniques

Simple Random Sampling Systematic Random Sampling Stratified Random Sampling Cluster Sampling

1. Simple random sample: Each unit in the population is identified, and each unit has an
equal chance of being in the sample. The selection of each unit is independent of the
selection of every other unit. Selection of one unit does not affect the chances of any
other unit.

For example, to select a sample of 25 people who live in your college dorm, make a list
of all the 250 people who live in the dorm. Assign each person a unique number,
between 1 and 250. Then refer to a table of random numbers. Starting at any point in
the table read across or down and note every number that falls between 1 and 250. Use
the numbers you have found to pull the names from the list that correspond to the 25
numbers you found. These 25 people are your sample. This is called the table of random
numbers method.

Another way to select this simple random sample is to take 250 ping-pong balls and
number then from 1 to 250. Put them into a large barrel and mix them up, and then
grab 25 balls. Read off the numbers. Those are the 25 people in your sample. This is
called the lottery method.
https://rbigradeb.wordpress.com P a g e | 10 http://www.edutap.co.in
2. Systematic random sampling: Each unit in the population is identified, and each unit
has an equal chance of being in the sample.

For example, to select a sample of 25 dorm rooms in your college dorm, make a list of all
the room numbers in the dorm. Say there are 100 rooms. Divide the total number of
rooms (100) by the number of rooms you want in the sample (25). The answer is 4. This
means that you are going to select every fourth dorm room from the list. But you must
first consult a table of random numbers. Pick any point on the table, and read across or
down until you come to a number between 1 and 4. This is your random starting point.
Say your random starting point is "3". This means you select dorm room 3 as your first
room, and then every fourth room down the list (3, 7, 11, 15, 19, etc.) until you have 25
rooms selected.

This method is useful for selecting large samples, say 100 or more. It is less cumbersome
than a simple random sample using either a table of random numbers or a lottery
method. For example, you might have to sample files in a large filing cabinet. It is easier
to select every 17th file than to pull out all the files and number them, etc.

However, you must be aware of problems that can arise in systematic random sampling.
If the selection interval matches some pattern in the list (e.g., each 4th dorm room is a
single unit, where all the others are doubles) you will introduce systematic bias into
your sample.

3. Stratified random sampling: Each unit in the population is identified, and each unit has
a known, non-zero chance of being in the sample. This is used when the researcher
knows that the population has sub-groups (strata) that are of interest.

For example, if you wanted to find out the attitudes of students on your campus about
immigration, you may want to be sure to sample students who are from every region of
the country as well as foreign students. Say your student body of 10,000 students is
made up of 8,000 - West; 1,000 - East; 500 - Midwest; 300 - South; 200 - Foreign.

If you select a simple random sample of 500 students, you might not get any from the
Midwest, South, or Foreign. To make sure that you get some students from each group,
you can divide the students into these five groups, and then select the same percentage
of students from each group using a simple random sampling method. This is
proportional stratified random sampling. So if you choose 10% from each group then
you would choose 8,00 - West; 1,00 - East; 50 - Midwest; 30 - South; 20 - Foreign. So you
can see the proportion of sample remain the same as that of original population and
hence called proportional stratified random sampling

However, you may still have too few of some types of students. Instead, you may divide
students into the five groups and then select the same number of students from each
group using a simple random sampling method. This is disproportionate stratified
random sampling. This allows you to have enough students in each sub-group so that
you can perform some meaningful statistical analyses of the attitudes of students in
each sub-group. In order to say something about the attitudes of the total student
https://rbigradeb.wordpress.com P a g e | 11 http://www.edutap.co.in
population of the university, however, you will have to apply weights to the findings for
each sub-group, proportional to its presence in the total student body.

4. Cluster sampling: cluster sampling views the units in a population as not only being
members of the total population but as members also of naturally-occurring in clusters
within the population. For example, city residents are also residents of neighbourhoods,
blocks, and housing structures.

Cluster sampling is used in large geographic samples where no list is available of all the
units in the population but the population boundaries can be well-defined. For example,
to obtain information about the drug habits of all high school students in a state, you
could obtain a list of all the school districts in the state and select a simple random
sample of school districts. Then, within in each selected school district, list all the high
schools and select a simple random sample of high schools. Within each selected high
school, list all high school classes, and select a simple random sample of classes. Then
use the high school students in those classes as your sample.

Cluster sampling must use a random sampling method at each stage. This may result in
a somewhat larger sample than using a simple random sampling method, but it saves
time and money. It is also cheaper to administer than a statewide sample of high school
seniors, because there are many fewer sites to obtain information from.

Non - Random Sampling


There may be a situation that you have to select 10 out of 100 households in a locality. You
have to decide which household to select and which to reject. You may select the households
conveniently situated or the households known to you or your friend. In this case, you are using
your judgement (bias) in selecting 10 household. This way of selecting 10 out of 100 households
is not a random selection. In a non-random sampling method all the units of the population do
not have an equal chance of being selected and convenience or judgement of the investigator
plays an important role in selection of the sample. They are mainly selected on the basis of
judgment, purpose, convenience or quota and are non-random samples. Some types of Non-
Random sampling techniques are:

Non-Random Sampling
Techniques

Convenience sample Purposive sample Quota sample

https://rbigradeb.wordpress.com P a g e | 12 http://www.edutap.co.in
1. Convenience sample: also called an "accidental" sample or "man-in-the-street" samples.
The researcher selects units that are convenient, close at hand, easy to reach, etc.

2. Purposive sample: the researcher selects the units with some purpose in mind, for
example to study the cleanliness in schools he selects students who live in dorms on
campus.

3. Quota sample: the researcher constructs quotas for different types of units. For
example, to interview a fixed number of shoppers at a mall, half of whom are male and
half of whom are female.

5 Processing the Raw Data


The Data that you will get after completing the survey would be raw data.

Example: The sale of shoes of various sizes at a shop, on a given day is given below:

7 8 5 4 9 8 5 7 6 8 9 6 7 9

8 7 9 9 6 5 8 9 4 5 5 8 9 6

Like the kabadiwallah’s junk, the unclassified data or raw data are highly disorganised. They are
often very large and cumbersome to handle.

To draw meaningful conclusions from them is a tedious task because they do not yield to
statistical methods easily. Therefore proper organisation and presentation of such data is
needed before any systematic statistical analysis is undertaken. Hence after collecting data the
next step is to organise and present them in a classified form

The raw data are summarised, and made comprehensible by classification. When facts of
similar characteristics are placed in the same class, it enables one to locate them easily, make
comparison, and draw inferences without any difficulty

One such way to classify raw data in frequency distribution

6 Frequency Distribution

A frequency can be defined as how often something happens. For example, the number of dogs
that people own in a neighborhood is a frequency.

A distribution refers to the pattern of these frequencies. A frequency distribution looks at how
frequently certain things happen within a sample of values.

https://rbigradeb.wordpress.com P a g e | 13 http://www.edutap.co.in
Frequency distribution can be done for Ungrouped or Grouped date

1. Frequency Distribution for Ungrouped Data

Example: The sale of shoes of various sizes at a shop, on a given day is given below:

7 8 5 4 9 8 5 7 6 8 9 6 7 9

8 7 9 9 6 5 8 9 4 5 5 8 9 6

The frequency distribution for ungrouped data may be like below

Shoe size Sale (Frequency)


4 2
5 5
6 4
7 4
8 6

9 7

So looking at this table we can easily tell that there were 6 pieces sold for shoe size 8. We can also
tell that least number of shoes were sold for shoe size 4

2. Grouped Data
To put the data in a more condensed form, we make groups of suitable size, and
mention the frequency of each group. Such a table is called a grouped frequency
distribution table.

Class-Interval: Each class is bounded by two figures, which are called class limits. The
figure on the left side of a class is called its lower limit and that on its right is called
its upper limit.

Types of Grouped Frequency Distribution

1. Exclusive Form (or Continuous Interval Form): A frequency distribution in which the
upper limit of each class is excluded and lower limit is included, is called an exclusive
form.

Example: Suppose the marks obtained by some students in an examination are given.
We may consider the classes 0 – 10, 10 – 20 etc.

In class 0 – 10, we include 0 and exclude 10. In class 10 – 20, we include 10 and
exclude 20
https://rbigradeb.wordpress.com P a g e | 14 http://www.edutap.co.in
2. Inclusive From (or Discontinuous Interval From): A frequency distribution in which
each upper limit ad well as lower limit is included is called an inclusive form. Thus, we
have classes of the form 0 – 10, 11 – 20, 21 – 30 etc.

In class 0-10, both 0 and 10 would be included and in class 11 – 20 again both 11 and
20 would be included

Example:

Given below are the marks obtained by 40 students in an examination:

3, 25, 48, 23, 17, 13, 11, 9, 46, 41, 37, 45, 10, 19, 39, 36, 34, 5, 17, 21,

39, 33, 28, 25, 12, 3, 8, 17, 48, 34, 15, 19, 32, 32, 19, 21, 28, 32, 20, 23,

Arrange the data in ascending order and present it as a grouped data in:

(i) Discontinuous Interval form, taking class-intervals 1 – 10, 11 – 20, etc.

(ii) Continuous Interval form, taking class-intervals 1 – 10, 10 – 20, etc.

Solution

Arrange the marks in ascending order, we get:

3, 3, 5, 8, 9, 10, 11, 12, 13, 15, 17, 17, 17, 19, 19, 19, 20, 21, 21, 23, 23, 25, 25, 28, 28, 32, 32, 32,
33, 34, 34, 36, 37, 39, 39, 41, 45, 46, 48, 48,

Discontinuous Interval Form (or Inclusive Form)

Class Interval Frequency


0-9 5
10-19 11
20-29 10
30-39 9
40-49 5

Total 40

Note that the class 1 – 10 means, marks obtained from 1 to 10, including both.

Continuous Interval Form (or Exclusive Form)

https://rbigradeb.wordpress.com P a g e | 15 http://www.edutap.co.in
Class Interval Frequency
0-10 5
10-20 11
20-30 10
30-40 9
40-50 5

Total 40

Here, the class 1 – 10 means, marks obtained from 1 to 9, i.e., excluding 10.

6.1 More Examples of Frequency Distribution

Example 1

Consider the marks obtained (out of 100 marks) by 30 students of Class IX of a school:

As discussed earlier, the number of students who have obtained a certain number of marks is
called the frequency of those marks. For instance, 4 students got 70 marks. So the frequency of
70 marks is 4. The frequency distribution for this ungrouped data is shown below

https://rbigradeb.wordpress.com P a g e | 16 http://www.edutap.co.in
Example 2

100 plants each were planted in 100 schools during Van Mahotsava. After one month, the
number of plants that survived were recorded as

To present such a large amount of data so that a reader can make sense of it easily, we
condense it into groups like 20-29, 30-39, . . ., 90-99 (since our data is from 23 to 98).

The column tally marks is nothing but way of counting. After every 4 tally marks 5 th one is used
to strike the previous four indicating a group of 5 observations in that range. Tally marks are
supposed to be used by school students. It will be of no use to you

7 Some Common Terms


Let’s Take an example of the below frequency distribution

Class Interval Frequency


0-10 5
10-20 11
20-30 10

https://rbigradeb.wordpress.com P a g e | 17 http://www.edutap.co.in
30-40 9
40-50 5

Total 40

1. Class: Each interval is called a class. For example 0-10 is a class

2. Class Limits: The values on the boundaries are called class limits. The lower end is called
lower class limit and higher end is called higher class limit. For example in class 0-10 the
lower class limit is 0 and higher class limit is 10

3. Class Interval or Class Width is the difference between the upper class limit and the lower
class limit. For example in class 0-10 the class interval is 10

4. The Class Mid-Point or Class Mark is the middle value of a class. It lies halfway between the
lower class limit and the upper class limit of a class and can be ascertained in the following
manner:

5. The class mark or mid-value of each class is used to represent the class. Once raw data are
grouped into classes, individual observations are not used in further calculations. Instead,
the class mark is used.

8 Adjusting the class interval


Let’s take an example of grouped frequency which is having Discontinuous Interval form

Discontinuous Interval Form (or Inclusive Form)

Class Interval Frequency


0-9 5
10-19 11
20-29 10
30-39 9
40-49 5

Total 40

https://rbigradeb.wordpress.com P a g e | 18 http://www.edutap.co.in
Now a point to think in this example is that there is gap of 1 mark between each interval i.e. 1st
interval ends at 9 but second interval starts at 10. So what if in future there are 9.5 0r 9.4 marks
obtained by the student? There is no provision to record 9.4 or 9.5 marks as per current class
interval.

The solution is to adjust the class interval

1. Find the difference between the lower limit of the second class and the upper limit of
the first class. In this case it would be 10-9 i.e. 1

2. Divide the difference obtained in step 1 by two i.e. (1/2 = 0.5)

3. Subtract the value obtained in step 2 from lower limits of all classes

4. Add the value obtained in step 2 to upper limits of all classes (upper class limit + 0.5)

So it shall look like below which is continuous interval form

Class Interval Frequency


-0.5-9.5 5
9.5-19.5 11
19.5-29.5 10
29.5 – 39.5 9
39.5 – 49.5 5

Total 40

9 What is Bivariate Frequency Distribution?


The frequency distribution of a single variable is called a Univariate Distribution. The example
discussed ill now shows the univariate distribution of the single variable “marks of a student”. A
Bivariate Frequency Distribution is the frequency distribution of two variables.

Below Table shows the frequency distribution of two variable sales and advertisement
expenditure (in Rs. lakhs) of 20 companies.

https://rbigradeb.wordpress.com P a g e | 19 http://www.edutap.co.in
The values of sales are classed in different columns and the values of advertisement
expenditure are classed in different rows

For example, there are 3 firms whose sales are between Rs 135–145 lakhs and their
advertisement expenditures are between Rs 64–66 thousands

The use of bivariate frequency is in finding correlation between two variables which we shall
study later

10 Frequency Diagrams
There are various ways to represent frequency distribution

1. Histogram
It is one of the most popular and widely used met had of presenting a frequency
distribution. It is a graph in which the frequencies are represented by bars. The
histogram appears as a series of bar graphs placed one next to the other in a vertical
way

Let’s take an example of marks scored by students in the table below. The class
intervals have been adjusted as they were not continuous and are listed in table as
“True Class Limits”

https://rbigradeb.wordpress.com P a g e | 20 http://www.edutap.co.in
The histogram plotted for the above data would be

https://rbigradeb.wordpress.com P a g e | 21 http://www.edutap.co.in
Let’s take another example

The histogram plotted for the above data would be

https://rbigradeb.wordpress.com P a g e | 22 http://www.edutap.co.in
Advantages:
1. It is simple and easily made.

2. All the advantages of the graphic representation as shown earlier are applicable here.

Limitations:
1. It is difficult to superimpose more than one histogram on the same graph.

2. Comparisons of several frequency distributions cannot readily be made via


histograms. Frequency polygons are much better suited for that purpose.

2. Frequency Polygon
A polygon is a many-angled close figure. The frequency polygon is a graphic
representation of frequency distribution in which the midpoints of the Class Interval are
plotted against the frequencies.

Let’s take an example of the below data

https://rbigradeb.wordpress.com P a g e | 23 http://www.edutap.co.in
The frequency Polygon will look like below

3. Smoothed Frequency Polygon

A frequency polygon should be smoothed:

1. To iron out chance irregularities;

2. To get better notion of how the figure may look if data were more numerous;

3. To know how a polygon would look if grouping errors and sampling errors are removed

from it

4. To ascertain the shape which it would take if it represents conditions freed from minor

accidental fluctuations.

In smoothing a frequency polygon, a series of moving or running averages are taken, from
which new or adjusted frequencies are determined. To find an adjusted or smoothed ‘f, add
the f on the given interval and the f s on the two adjacent intervals (the interval just below
and the interval just above) and divide them by 3.

For example, the smoothed f for interval 170-174 is (8+10+6)/3 or 8.00

The smoothened data is shown below

https://rbigradeb.wordpress.com P a g e | 24 http://www.edutap.co.in
The smoothened curve is shown below

Advantages for frequency Polygon / Smoothened Curve:

1. It is simple and easily made.

2. It is possible to superimpose more than one frequency polygon on the same graph by

using colored lines, broken lines, dotted lines, etc.

3. Comparisons of several frequency distributions can readily be made via frequency

polygons.

4. All the advantages of the graphic representation as discussed earlier are applicable here.

https://rbigradeb.wordpress.com P a g e | 25 http://www.edutap.co.in
5. Pie Chart

Imagine you survey your friends to find the kind of movie they like best:

You can show the data by this Pie Chart:

It is a really good way to show relative sizes: it is easy to see which movie types are most
liked, and which are least liked, at a glance.

https://rbigradeb.wordpress.com P a g e | 26 http://www.edutap.co.in
11 Cumulative Frequency

Suppose below are the marks obtained by Students. As you can see this is an example of non-
grouped frequency distribution

Now other way to look at this would be that how many students got up to 25 marks. The
answer would be 20 + 6 = 26. In the same way students getting marks up to 29 would be
6+20+24+28 = 78

This frequency obtained by adding all the frequencies up to and including frequency of that value is
called cumulative frequency for that value. The table can be shown like this

https://rbigradeb.wordpress.com P a g e | 27 http://www.edutap.co.in
11.1 Cumulative Frequency for Grouped Data
Consider a grouped frequency distribution of marks obtained, out of 100, by 53 students, in a
certain examination.

From the table above, try to answer the following questions:

1. How many students have scored marks less than 10? The answer is clearly 5.

2. How many students have scored less than 20 marks? Observe that the number of
students who have scored less than 20 include the number of students who have scored
marks from 0 - 10 as well as the number of students who have scored marks from 10 -
20. So, the total number of students with marks less than 20 is 5 + 3, i.e., 8.

We say that the cumulative frequency of the class 10 -20 is 8.

https://rbigradeb.wordpress.com P a g e | 28 http://www.edutap.co.in
Similarly, we can compute the cumulative frequencies of the other classes, i.e., the number of
students with marks less than 30, less than 40, less than 100

The distribution given above is called the cumulative frequency distribution of less than type.
Here 10, 20, 30, . . . 100, are the upper limits of the respective class intervals

We can similarly make the table for the number of students with scores, more than or equal to
0, more than or equal to 10, more than or equal to 20, and so on. We observe that all 53
students have scored marks more than or equal to 0. Since there are 5 students scoring marks
in the interval 0 - 10, this means that there are 53 – 5 = 48 students getting more than or equal
to 10 marks. Continuing in the same manner, we get the number of students scoring 20 or
above as 48 – 3 = 45, 30 or above as 45 – 4 = 41, and so on.

The table above is called a cumulative frequency distribution of the more than type. Here 0, 10,
20, . . ., 90 give the lower limits of the respective class intervals
https://rbigradeb.wordpress.com P a g e | 29 http://www.edutap.co.in
Normally the cumulative frequency table of less than type is used to solve questions which we
shall see later

12 Frequency Diagram for Cumulative Frequency


We have seen earlier the frequency diagrams for normal frequency distribution. Here we shall see
frequency diagram for cumulative frequency distribution

1. Cumulative Frequency Graph or Ogive


The cumulative frequency graph is another way of representing a frequency distribution
by means of a diagram. Before we can plot a cumulative frequency graph, the scores of
the distribution must be added serially or cumulated

To determine the cumulative frequency for each row we have to go on adding the f s
progressively from the bottom. To illustrate, in the distribution of scores the first
cumulative frequency is 1; 1 + 3, from the low end of the distribution, gives 4 as the next
entry; 4 + 2 = 6; 6 + 4 = 10, etc. The last cumulative/is equal, of course, to 50 or N, the
total frequency

In plotting the frequency polygon the frequency on each interval is taken at the mid-
point of the class interval. But in constructing a cumulative frequency curve each
cumulative frequency is plotted at the exact upper limit of the interval upon which it
falls

Let us taken an example of below data. Ignore the last column as of now. We will discuss
it later

https://rbigradeb.wordpress.com P a g e | 30 http://www.edutap.co.in
The upper limit s 139.5 and not 139 because the class intervals given in the first column
are not continuous. So they need to be adjusted as discussed in earlier sections. The
new class interval will be like this 134.5 – 139.5, 139.5 – 144.5 ….so on

The Cumulative frequency curve shall look like below

Cumulative Frequency Graph is also called Ogive

Looking at this graph we can directly tell that 20 people are having less score than 169.5

13 Types of Statistics
There are two types of Statistics

Types of Statistics

Descriptive Statistics Inferential Statistics

https://rbigradeb.wordpress.com P a g e | 31 http://www.edutap.co.in
13.1 Descriptive Statistics

The descriptive statistics is the type of statistical analysis which helps to describe about the
data in some meaningful way. This statistics is used to describe quantitatively about the
important features of the data or information. The descriptive statistics gives
the summaries of the given sample as well as the observations done. These summaries or
descriptions can either be graphical or quantitative.

For Example: In soccer, the individual performance of each player is said to be the a
descriptive statistics.

However, descriptive statistics does not reach at conclusions beyond the given data or
hypothesis made by the researcher. It is just a simple way of describing the data.

Generally, the kinds of measure that are used with descriptive statistics are:

I. Measures of Central Tendency: The measure of central tendency describes the data
which lies in the center of a given frequency distribution. The main measures of central
tendency are mean and median and mode.

II. Measures of Spread or dispersion: The measure of spread describes the how the scores
are spread over the whole distribution. Standard deviation, variance, quartiles, range,
absolute deviation are included in the measures of spread.

III. Graphical Representation: There are several different types of graphs that are used to
describe about the statistical data. These graphs are histogram, bar graph, box and
whisker plot, line graph, scatter plot, ogive, pie chart and many more.

We will discuss about three three measures of Descriptive statistics in the coming chapter

13.2 Inferential Statistics


Inferential statistics is the type of statistics which deals with making conclusions. It inferences about the
predictions for the population. It also analyses the sample. Basically, the inferential statistics is the
procedure of drawing predictions and conclusions about the given data which is subjected to the
random variations

This type of statistics is being utilized in order to make estimates and test the hypotheses using given
data.

For example: By comparing the performance of football players of the two teams, inferential statistics
will help us predict which team will won the next match

There are two major divisions of inferential statistics:


https://rbigradeb.wordpress.com P a g e | 32 http://www.edutap.co.in
I. Confidence Interval: The confidence interval is represented in the form of an interval that
provides a range for the parameter of given population.

II. Hypothesis Test: Hypothesis tests are also known as tests of significance which tests some claim
for the population by analyzing sample.

We will learn later in detail about the confidence interval and Hypothesis test

14 MCQ’s (Multiple Choice Questions)


Click the next button on the bottom of your screen to attempt the Test containing quality
MCQ’s on this topic.

1. The Pattern of the test is based on the Real Examination Pattern


2. This helps you in assessing your understanding and is very useful in improving retention.
3. You will also get to know the correct answers and related explanation at the end of the
test.

So do not forget to attempt these MCQ’s.

Happy Learning!!!

https://rbigradeb.wordpress.com P a g e | 33 http://www.edutap.co.in

You might also like