0% found this document useful (0 votes)
6 views

LAS Unit-04 Study Material F.Y.B.tech Sem-II 2022-23

The document provides an overview of statistical methods, including the definition of statistics, the scope of its applications, and the importance of data collection. It distinguishes between primary and secondary data, outlines various types of data (categorical, ordinal, discrete, continuous), and discusses methods for data presentation, including diagrams and charts. Additionally, it explains frequency distributions and their types, emphasizing the role of statistics in research and decision-making.

Uploaded by

parveznadaf05
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

LAS Unit-04 Study Material F.Y.B.tech Sem-II 2022-23

The document provides an overview of statistical methods, including the definition of statistics, the scope of its applications, and the importance of data collection. It distinguishes between primary and secondary data, outlines various types of data (categorical, ordinal, discrete, continuous), and discusses methods for data presentation, including diagrams and charts. Additionally, it explains frequency distributions and their types, emphasizing the role of statistics in research and decision-making.

Uploaded by

parveznadaf05
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Vishwakarma University

Faculty of Science &Technology Engineering


Department of Engineering Sciences
F.Y.B.Tech Computer Engg Sem-II A.Y.2022-23
Study Material
Course: Linear Algebra & Statistics
Unit-04: Scope of Statistical Methods

Statistics is a field of mathematics that is related to data analysis. For the last few centuries,
statistics has remained a part of mathematics as the original work was done by
mathematicians like Pascal, James Bernoulli, De-Moivre, Laplace, Gauss and others. Till
early nineteenth century, statistics was mainly concerned with official statistics needed for
the collection of information on revenue, population etc. The science of statistics developed
gradually and its field of application widened day by day. In fact, the term statistics is
generally used to mean numerical facts and figures.
Statistics: the collection, presentation, analysis and interpretation of numerical data.
Statistics is the study of the collection, organization, analysis, interpretation and
presentation of data with the use of quantified models. In short, it is a mathematical tool
that is used to collect and summarize data.

Statistics is a branch of applied mathematics that involves the collection, description,


analysis, and inference of conclusions from quantitative data.

Scope of Statistics:

1. It presents the facts in numerical figures. For example, recording the sales of various
products in a company.
2. It studies the relationship between two or more phenomena. For example, in
medical science, for collection, presentation and analysis of observed facts relating
to causes and incidence of diseases and the result of the application of various
drugs.
3. It helps in the formulation of policies. For example, Economic policy is formulated
by governments by considering and correlating the data regarding profits &
dividends, assets & liabilities, income & expenditures.
4. It presents complex facts in a simplified form. For example, in Astronomy, to find
most probable measurements of the distance, sizes, masses and densities of
heavenly bodies by means of observations.
5. It helps in forecasting. For example, stock market results, sales, GDP etc.
6. It provides techniques for testing of hypotheses. For example, in planning the
marketing strategies.

Prepared by Jameel .A. Ansari


7. It provides techniques for making decisions under uncertainty. For example, finding
the probabilities for change in customer demand.
8. Statistical methods are extensively used in every type of research work. Whether it
is agriculture, health, or social science, the statistics help in carrying out different
types of research.

Data:

Data is defined as "facts and statistics collected together for reference or analysis." In
other words, data is information that has been gathered and analyzed in order to be used
for a specific purpose.

Data is important because it helps us understand the world around us, test hypotheses
and make predictions.

Data collection:

In Statistics, the data collection is a process of gathering information from all the relevant
sources to find a solution to the research problem. It helps to evaluate evaluate the outcome
outcome of the problem. The data collection methods allow a person to conclude an answer
to the relevant question. Most of the organizations use data collection methods to make
assumptions about future probabilities and trends.

Data can be collected in two different categories

1. Primary Data and


2. Secondary data

Primary Data: Primary data is the one, which is collected by the investigator himself for
the purpose of a specific inquiry or study. Such data is original in character and is
generated by survey conducted by individuals or research institution or any organisation.

Example: If a researcher is interested to know the impact of noon meal scheme for the
school children, he has to undertake a survey and collect data on the opinion of parents and
children by asking relevant questions. Such a data collected for the purpose is called
primary data.

The primary data can be collected by the following five methods.


1. Direct personal interviews.
2. Indirect Oral interviews.
3. Information from correspondents.
4. Mailed questionnaire method.
5. Schedules sent through enumerators.

Prepared by Jameel .A. Ansari


Secondary Data: Secondary data are those data which have been already collected and
analysed by some earlier agency for its own use; and later the same data are used by a
different agency.
According to W.A.Neiswanger, ‘ A primary source is a publication in which the data are
published by the same authority which gathered and analysed them.
A secondary source is a publication, reporting the data which have been gathered by other
authorities and for which others are responsible’ .

Sources of Secondary data: In most of the studies the investigator finds it impracticable to
collect first-hand information on all related issues and as such he makes use of the data
collected by others. There is a vast amount of published information from which statistical
studies may be made and fresh statistics are constantly in a state of production.
The sources of secondary data can broadly be classified under two heads:
1. Published sources, and
2. Unpublished sources.

1. Published Sources:

The various sources of published data are: Clinical and other personal records, death
certificates, published mortality statistics, census publications, etc.

Examples include:
1. Official publications of Central Statistical Authority
2. Publication of Ministry of Health and Other Ministries
3. News Papers and Journals.
4. International Publications like Publications by WHO, World Bank, UNICEF
5. Records of hospitals or any Health Institutions.
Note: A lot of secondary data is available in the internet. We can access it at any time for the
further studies.

2. Unpublished Sources.
All statistical material is not always published. There are various sources of unpublished
data such as records maintained by various Government and private offices, studies made
by research institutions, scholars, etc. Such sources can also be used where necessary.
Precautions in the use of Secondary data
The following are some of the points that are to be considered in the use of secondary data
1. How the data has been collected and processed
2. The accuracy of the data
3. How far the data has been summarized
4. How comparable the data is with other tabulations
5. How to interpret the data, especially when figures collected for one purpose is used for
another Generally speaking, with secondary data, people have to compromise between what
they want and what they are able to find.

Prepared by Jameel .A. Ansari


Types of Data in Statistics:

Categorical Or Qualitative Data:

To explain the types of data, we have categorized the types of statistical data with
examples and detailed insights here.

1. Nominal Data

Nominal data is a type of data that includes names or labels. Examples of nominal data
include gender, Nationality, Religion, etc. In research studies, nominal data is often
used to group participants into different categories. For instance, researchers may
want to study the effects of a new treatment on men and women. In this case, the
nominal data would be used to separate the participants into two groups: men and
women.

Nominal data is also sometimes used to measure satisfaction levels. For instance, a
customer satisfaction survey might ask customers to rate their experience on a scale
from 1 to 5, with 1 being "very unsatisfied" and 5 being "very satisfied." In this case,
the numerical values represent different categories (satisfaction levels), so the data
would be considered nominal.

Prepared by Jameel .A. Ansari


2. Ordinal Data

The term ordinal data refers to data with labels that indicate ranking or order.
Examples of ordinal data include social class (upper class, middle class, lower class),
opinions (excellent, good, bad), and satisfaction ratings (Very Satisfied, Satisfied,
Neutral, Unsatisfied, Very Unsatisfied)

Numerical or Qualitative Data:


1. Discrete Data
Discrete data is countable, meaning that it can be broken down into individual units.
This can include things like the number of people in a room or the number of cars on
the road. Discrete data is usually collected through surveys or experiments, and it can
be represented using graphs or tables. One advantage of discrete data is that it is easy
to understand and interpret. However, one downside is that it can be difficult to
obtain accurate results if the sample size is small. Additionally, discrete data can only
be used to measure a limited number of variables.

For example, if you were tracking the number of students in each grade at a school,
the data would be discrete because there are a finite number of possibilities (ranging
from 0 to the maximum number of students in any given grade).

Continuous Data

Continuous data is a type of data that can take on any value within a certain range.
That is, the data is not divided into distinct values but rather exists as points along a
continuum. Continuous data is often difficult to collect because it requires precise
measurements. It is also more difficult to analyze than discrete data because it often
contains errors. However, continuous data provides more information than discrete
data and can be used to make more accurate predictions. For these reasons,
continuous data is often used in fields such as weather forecasting and medicine.

For example, the temperature is continuous data because it can be any number within
a certain range (32 degrees, 33 degrees, 34 degrees etc.).

Prepared by Jameel .A. Ansari


Diagrammatic Presentation of Data:

Diagrams are an essential operational tool for the presentation of statistical data. They
are objects, mainly geometrical figures such as lines, circles, bars, etc. Statistics
elaborated with the help of diagrams make it easier and simpler, thereby enhancing the
representation of any type of data.

Representation of data assisted by diagrams to increase the simplicity of the statistics


surrounding the concerned data is defined as a diagrammatic representation of data.
These diagrams are nothing but the use of geometrical figures to improve the overall
presentation and offer visual assistance for the reader.

From the charts and diagrams, we will be able to make an analysis and possibly predict
future outcomes.

Types of Diagram used in Data Presentation:

Line Diagram:

Line diagram is used to represent specific data across varying parameters. A line represents
the sequence of data connected against a particular variable.
A line chart is also called, a line plot or line graph. It is used to show how variables and
information change over time. The information on a line chart is represented with points
and the points are connected with a continuous line.

For example, if you have information on how the price of petrol changed over 5 months,
you can represent that in a line chart so that the trend can be viewed and studied.

Prepared by Jameel .A. Ansari


Scatter Plot:

A scatter plot displays the relationship between two sets of data. In a scatter plot, dots
are used to represent the values of the data. After collecting data and plotting it, the
pattern of the dots on the plots will tell the relationship between the sets that are being
compared.

For example, if you have information about a person's weight at different ages of his life,
you can represent that in a scatter plot and it will look like the figure below.

Prepared by Jameel .A. Ansari


Bar Diagram:
Bar Diagram is used mostly for the comparison of statistical data. It is one of the most
straightforward representations of data with the use of rectangular objects of equal width.

Properties of Bar Diagram:


a. The Bars can be used in vertical and horizontal directions.

b. These Bars all have a uniform width.

c. All the Bars have a common base.


d. The height of the Bar usually corresponds to the required value.

e.g Given data set

Bar Chart Presentation:

Solution:

(i) We represent the above data by a simple bar diagram in the following manner:

Step-1: Years are marked along the X-axis and labeled as ‘Year’.

Step-2: Values of Production Cost are marked along the Y-axis and labeled as ‘Production
Cost (in lakhs of `).

Step-3: Vertical rectangular bars are erected on the years marked and whose height is
proportional to the magnitude of the respective production cost.

Step-4: Vertical bars are filled with the same colors.

Prepared by Jameel .A. Ansari


Pie Chart:
To know what a Pie Diagram is, it is advised to brush up on the fundamentals of the
geometrical theories and formula of a Circle. For the statistical representation of data, the
sectors of a circle are used as the data points of a particular dataset. A sector is the area of a
circle formed by the several divisions done by the radii of the same circle.

The Pie diagram is a circular diagram. As the diagram looks like a pie, it is given this name. A
circle which has 360c is divided into different sectors. Angles of the sectors, subtending at
the center, are proportional to the magnitudes of the frequency of the components.

Procedure:

The following procedure can be followed to draw a Pie diagram for a given data:

i. Calculate total frequency, say, N.

ii. Compute angles for each component using the formula.

iii. Draw a circle with radius of sufficient length as a horizontal line.

Prepared by Jameel .A. Ansari


iv. Draw the first sector in the anti-clockwise direction at an angle calculated for the
first component.

v. Draw the second sector adjacent to the first sector at an angle corresponding to the
second component.

vi. This process may be continued for all the components.

vii. Shade/colour each sector with different shades/colours.

viii. Write legends to each component.

e.g: Data Set:

Solution :

The following procedure is followed to draw a Pie diagram for a given data:

i. Calculate the total expenditure, say, N.

ii. Compute angles for each component food, clothing, recreation, education, rent and
miscellaneous using the formula class frequency/N x 360

Prepared by Jameel .A. Ansari


iii. Draw a circle with radius of sufficient length as a horizontal line.

iv. Draw the first sector in the anti-clockwise direction at an angle calculated for the first
component food.

v. Draw the second sector adjacent to the first sector at an angle corresponding to the second
component clothing.

vi. This process is continued for all the components namely recreation, education, rent and
miscellaneous.

vii. Shade/colour each sector with different shades/colours.

viii. Write legends to each component.

Prepared by Jameel .A. Ansari


Histogram:

Just like the bar chart, the data in a histogram chart is represented with bars but a
histogram organizes data in ranges. It shows the frequency at which different ranges of
data occur.

A histogram graph is a bar graph representation of data. It is a representation of a range of


outcomes into columns formation along the x-axis. in the same histogram, the number
count or multiple occurrences in the data for each column is represented by the y-axis.

e.g: Given-the height of the trees (in inches): 61, 63, 64, 66, 68, 69, 71, 71.5, 72, 72.5, 73,
73.5, 74, 74.5, 76, 76.2, 76.5, 77, 77.5, 78, 78.5, 79, 79.2, 80, 81, 82, 83, 84, 85, 87. We can
group the data as follows in a frequency distribution table by setting a range:

Prepared by Jameel .A. Ansari


This data can be now shown using a histogram. We need to make sure that while plotting a
histogram, there shouldn’t be any gaps between the bars.

Prepared by Jameel .A. Ansari


Frequency Distribution:

a mathematical function showing the number of instances in which a variable takes each of
its possible values.
The frequency of a value is the number of times it occurs in a dataset. A frequency
distribution is the pattern of frequencies of a variable. It’s the number of times each
possible value of a variable occurs in a dataset.

Types of frequency distributions:


There are four types of frequency distributions:

• Ungrouped frequency distributions: The number of observations of each value of


a variable.
o You can use this type of frequency distribution for categorical variables.
• Grouped frequency distributions: The number of observations of each class
interval of a variable. Class intervals are ordered groupings of a variable’s values.
o You can use this type of frequency distribution for quantitative variables.
• Relative frequency distributions: The proportion of observations of each value or
class interval of a variable.
o You can use this type of frequency distribution for any type of
variable when you’re more interested in comparing frequencies than the
actual number of observations.
• Cumulative frequency distributions: The sum of the frequencies less than or equal
to each value or class interval of a variable.
o You can use this type of frequency distribution for ordinal or quantitative
variables when you want to understand how often observations fall below
certain values.

How to make a frequency table:


Frequency distributions are often displayed using frequency tables. A frequency table is
an effective way to summarize or organize a dataset. It’s usually composed of two columns:

• The values or class intervals


• Their frequencies

The method for making a frequency table differs between the four types of frequency
distributions. You can follow the guides below or use software such as Excel, SPSS, or R to
make a frequency table.

How to make an ungrouped frequency table

1. Create a table with two columns and as many rows as there are values of the
variable. Label the first column using the variable name and label the second column
“Frequency.” Enter the values in the first column.

Prepared by Jameel .A. Ansari


o For ordinal variables, the values should be ordered from smallest to largest
in the table rows.
o For nominal variables, the values can be in any order in the table. You may
wish to order them alphabetically or in some other logical order.
2. Count the frequencies. The frequencies are the number of times each value occurs.
Enter the frequencies in the second column of the table beside their corresponding
values.
o Especially if your dataset is large, it may help to count the frequencies
by tallying. Add a third column called “Tally.” As you read the observations,
make a tick mark in the appropriate row of the tally column for each
observation. Count the tally marks to determine the frequency.

Example:

How to make a grouped frequency table:

1. Divide the variable into class intervals. Below is one method to divide a variable
into class intervals. Different methods will give different answers, but there’s no
agreement on the best method to calculate class intervals.

Prepared by Jameel .A. Ansari


o Calculate the range. Subtract the lowest value in the dataset from the
highest.
o Decide the class interval width. There are no firm rules on how to choose
the width, but the following formula is a rule of thumb:

o You can round this value to a whole number or a number that’s convenient to
add (such as a multiple of 10).
o Calculate the class intervals. Each interval is defined by a lower limit and
upper limit. Observations in a class interval are greater than or equal to the
lower limit and less than the upper limit:

The lower limit of the first interval is the lowest value in the dataset. Add the
class interval width to find the upper limit of the first interval and the lower
limit of the second variable. Keep adding the interval width to calculate more
class intervals until you exceed the highest value.

2. Create a table with two columns and as many rows as there are class intervals.
Label the first column using the variable name and label the second column
“Frequency.” Enter the class intervals in the first column.

3. Count the frequencies. The frequencies are the number of observations in each
class interval. You can count by tallying if you find it helpful. Enter the frequencies in
the second column of the table beside their corresponding class intervals.

Example: Grouped frequency distributionA sociologist conducted a survey of 20 adults. She


wants to report the frequency distribution of the ages of the survey respondents. The
respondents were the following ages in years:
52, 34, 32, 29, 63, 40, 46, 54, 36, 36, 24, 19, 45, 20, 28, 29, 38, 33, 49, 37

Prepared by Jameel .A. Ansari


Round the class interval width to 10.

The class intervals are 19 ≤ a < 29, 29 ≤ a < 39, 39 ≤ a < 49, 49 ≤ a < 59, and 59 ≤ a < 69.

Grouped frequency table of the ages of survey participants

Example:

Prepared by Jameel .A. Ansari


How to make a relative frequency table

1. Create an ungrouped or grouped frequency table.


2. Add a third column to the table for the relative frequencies. To calculate the
relative frequencies, divide each frequency by the sample size. The sample size is the
sum of the frequencies.

Examples: Relative frequency distribution

Prepared by Jameel .A. Ansari


How to make a cumulative frequency table

1. Create an ungrouped or grouped frequency table for an ordinal or quantitative


variable. Cumulative frequencies don’t make sense for nominal variables because
the values have no order—one value isn’t more than or less than another value.
2. Add a third column to the table for the cumulative frequencies. The cumulative
frequency is the number of observations less than or equal to a certain value or class
interval. To calculate the relative frequencies, add each frequency to the frequencies
in the previous rows.
3. Optional: If you want to calculate the cumulative relative frequency, add another
column and divide each cumulative frequency by the sample size.

Example: Cumulative frequency distribution

Prepared by Jameel .A. Ansari


Frequency distribution can be visualized using:

• a pie chart (nominal variable),


• a bar chart (nominal or ordinal variable),
• a line chart (ordinal or discrete variable),
• or a histogram (continuous variable).

Central Tendency:

the tendency for the values of a random variable to cluster round its mean, mode, or media

Measures of central tendency are summary statistics that represent the center point or
typical value of a dataset. Examples of these measures include the mean, median, and mode.
These statistics indicate where most values in a distribution fall and are also referred to as
the central location of a distribution. You can think of central tendency as the propensity
for data points to cluster around a middle value.

In statistics, the mean, median, and mode are the three most common measures of central
tendency. Each one calculates the central point using a different method. Choosing the best
measure of central tendency depends on the type of data

Mean:

The mean is the arithmetic average, and it is probably the measure of central tendency that
you are most familiar. Calculating the mean is very simple. You just add up all of the values
and divide by the number of observations in your dataset.

Mean of Ungrouped Data

Let x1, x2, x3 , . . . , xn be n observations. We can find the arithmetic mean using the mean
formula:
Mean, x̄ = (x1 + x2 + ... + xn)/n

Example: If the heights of 5 people are 142 cm, 150 cm, 149 cm, 156 cm, and 153 cm.

Find the mean height.

Mean height, x̄ = (142 + 150 + 149 + 156 + 153)/5

= 750/5

= 150

Mean, x̄ = 150 cm

Prepared by Jameel .A. Ansari


Thus, the mean height is 150 cm.

When the data is present in tabular form, we use the following formula:

Mean, x̄ = (x1f1 + x2f2 + ... + xnfn)/(f1 + f2 + ... + fn)

Consider the following example.

Example 1: Find the mean of the following distribution:

x 4 6 9 10 15

f 5 10 10 7 8

Solution:

Calculation table for arithmetic mean:

xi fi xifi

4 5 20

6 10 60

9 10 90

10 7 70

15 8 120

∑ fi = 40 ∑ xi fi = 360

Mean, x̄ = (∑xi fi)/(∑fi)

= 360/40

=9

Thus, Mean = 9

Prepared by Jameel .A. Ansari


Example 2: Here is an example where the data is in the form of class intervals. The
following table indicates the data on the number of patients visiting a hospital in a month.
Find the average number of patients visiting the hospital in a day.

Number of patients Number of days visiting hospital

0-10 2

10-20 6

20-30 9

30-40 7

40-50 4

50-60 2

Solution:

In this case, we find the classmark (also called as mid-point of a class) for each class.

Note: Class mark = (lower limit + upper limit)/2

Let x1, x2, x3 , . . . , xn be the class marks of the respective classes.

Hence, we get the following table:

Class mark (xi) frequency (fi) xifi

5 2 10

15 6 90

25 9 225

35 7 245

Prepared by Jameel .A. Ansari


Class mark (xi) frequency (fi) xifi

45 4 180

55 2 110

Total ∑ fi = 30 ∑ fixi = 860

Mean, x̄ = (∑ xifi)/(∑ fi)

= 860/30

= 28.67

x̄ = 28.67

Median:

The median is the middle value. It is the value that splits the dataset in half, making it a
natural measure of central tendency.

To find the median, order your data from smallest to largest, and then find the data point
that has an equal number of values above it and below it. The method for locating the
median varies slightly depending on whether your dataset has an even or odd number of
values. If data set has an odd number of values then median is the middle value and If a
dataset contains an even number of values, the median of the dataset is the mean of the two
middle values.

Median of Ungrouped Data

• Step 1: Arrange the data in ascending or descending order.


• Step 2: Let the total number of observations be n.
To find the median, we need to consider if n is even or odd. If n is odd, then use the
formula:
Median = [(n + 1)/2]th observation

Prepared by Jameel .A. Ansari


Example 1: Let's consider the data: 56, 67, 54, 34, 78, 43, 23. What is the median?

Solution:

Arranging in ascending order, we get: 23, 34, 43, 54, 56, 67, 78. Here, n (number of
observations) = 7

So, (7 + 1)/2 = 4

∴ Median = 4th observation

Median = 54

If n is even, then use the formula:

Median = [(n/2)th obs.+ ((n/2) + 1)th obs.]/2

Example 2: Let's consider the data: 50, 67, 24, 34, 78, 43. What is the median?

Solution:

Arranging in ascending order, we get: 24, 34, 43, 50, 67, 78.

Here, n (no.of observations) = 6

6/2 = 3

Using the median formula,

Median = (3rd observation + 4th observation) / 2

= (43 + 50)/2

Median = 46.5

Median of Grouped Data

When the data is continuous and in the form of a frequency distribution, the median is
found as shown below:

Step 1: Find the median class.

Let n = total number of observations i.e. ∑ fi

Note: Median class is the class where (n/2) lies.

Step 2: Use the following formula to find the median.


𝑵 𝒉
Median = 𝒍 + ( − 𝒄𝒇) ×
𝟐 𝒇

Prepared by Jameel .A. Ansari


where,

• l = lower limit of median class


• cf = cumulative frequency of the class preceding the median class
• N = total number of observations
• f = frequency of the median class
• h = class size

Let's consider the following example to understand this better.

Example: Find the median marks for the following distribution:

Classes 0-10 10-20 20-30 30-40 40-50

Frequency 2 12 22 8 6

Solution:

We need to calculate the cumulative frequencies to find the median.

Calculation table:

Classes Number of students Cumulative frequency

0-10 2 2

10-20 12 2 + 12 = 14

20-30 22 14 + 22 = 36

30-40 8 36 + 8 = 44

40-50 6 44 + 6 = 50

N = 50

N/2 = 50/2 = 25

Prepared by Jameel .A. Ansari


Median Class = (20 - 30)

l = 20, f = 22, cf = 14, h = 10

Using Median formula:


𝑵 𝒉
Median = 𝒍 + ( − 𝒄𝒇) ×
𝟐 𝒇

= 20 + (25 - 14)/22 × 10

= 20 + (11/22) × 10

= 20 + 5 = 25

∴ Median = 25

Mode:

The mode is the value that occurs the most frequently in your data set, making it a different
type of measure of central tendency than the mean or median.

To find the mode, sort the values in your dataset by numeric values or by categories. Then
identify the value that occurs most often.

Mode = Observation with maximum frequency

For example in the data: 6, 8, 9, 3, 4, 6, 7, 6, 3, the value 6 appears the most number of
times. Thus, mode = 6. An easy way to remember mode is: Most Often Data Entered. Note:
A data may have no mode, 1 mode, or more than 1 mode. Depending upon the number of
modes the data has, it can be called unimodal, bimodal, trimodal, or multimodal.

The example discussed above has only 1 mode, so it is unimodal.

Case 2: Grouped Data

When the data is continuous, the mode can be found using the following steps:

• Step 1: Find modal class i.e. the class with maximum frequency.
• Step 2: Find mode using the following formula:
(𝒇𝒎 −𝒇𝟏 )
Mode = 𝒍 + ((𝟐𝒇 )×𝒉
𝒎 −𝒇𝟏 −𝒇𝟐 )

Prepared by Jameel .A. Ansari


where,

• l = lower limit of modal class,


• fm = frequency of modal class,
• f1 = frequency of class preceding modal class,
• f2 = frequency of class succeeding modal class,
• h = class width

Mode formula equivalently is written as follows as well:

Consider the following example to understand the formula.

Example: Find the mode of the given data:

Marks Obtained 0-20 20-40 40-60 60-80 80-100

Number of students 5 10 12 6 3

Solution:

The highest frequency = 12, so the modal class is 40-60.

l = lower limit of modal class = 40

fm = frequency of modal class = 12


f1 = frequency of class preceding modal class = 10
f2 = frequency of class succeeding modal class = 6

h = class width = 20

Using the mode formula,


(𝑓𝑚 −𝑓1 )
Mode = 𝑙 + ( )×ℎ
(2𝑓𝑚 −𝑓1 −𝑓2 )
(12−10)
= 40 + ( ) × 20
(24−10−6)

= 40 + (2/8) × 20

= 45

∴ Mode = 45

Prepared by Jameel .A. Ansari


Mean, Median and Mode Formulas

We covered the formulas and methods to find the mean, median, and mode for a grouped
and ungrouped set of data. Let us summarize and recall them using the list of mean,
median, and mode formulas given below,

• Mean formula for ungrouped data: Sum of all observations/Number of


observations
• Mean formula for grouped data: x̄ = (x1f1 + x2f2 + ... + xnfn)/(f1 + f2 + ... + fn)
• Median formula for ungrouped data: If n is odd, then use the formula: Median =
(n + 1)/2th observation. If n is even, then use the formula: Median =
[(n/2)th obs.+ ((n/2) + 1)th obs.]/2
𝑁 ℎ
• Median formula for grouped data: Median = 𝑙 + ( − 𝑐𝑓) × , where
2 𝑓

o l = lower limit of median class


o cf = cumulative frequency of the class preceding the median class
o f = frequency of the median class
o h = class size
• Mode formula for ungrouped data: Mode = Observation with maximum
frequency
(𝒇𝒎 −𝒇𝟏 )
• Mode formula for grouped data: Mode = 𝒍 + ( ) × 𝒉, where
(𝟐𝒇𝒎 −𝒇𝟏 −𝒇𝟐 )

o l = lower limit of modal class,


o fm = frequency of modal class,
o f1 = frequency of class preceding modal class,
o f2 = frequency of class succeeding modal class,
o h = class width

Take a quick look at the figure below with mean mode median formulas.

Prepared by Jameel .A. Ansari


Quartiles:

The values which divide an array (a set of data arranged in ascending or descending order)

into four equal parts are called Quartiles. The first, second and third quartiles are denoted

by Q1, Q2,Q3 respectively. The first and third quartiles are also called the lower and upper

quartiles respectively. The second quartile represents the median, the middle value.

Quartiles for Ungrouped Data:

Quartiles for ungrouped data are calculated by the following formula.

In order to apply formulae, we need to arrange the above data into ascending order i.e. in

the form of an array.

Quartiles for Grouped Data:

The quartiles may be determined from grouped data in the same way as the median except

that in place of n/2 we will use n/4. For calculating quartiles from grouped data we will

form cumulative frequency column. Quartiles for grouped data will be calculated from the
following formula:

Prepared by Jameel .A. Ansari


𝑵 𝒉
𝑸𝒊 = 𝒍 + (𝒊 × 𝟒 − 𝒄𝒇) × 𝒇 , 𝒊 = 𝟏, 𝟐, 𝟑

Where,
l = lower class boundary of the class containing the Q1 or Q3

, i.e. the class corresponding to the cumulative frequency in which n/4 or 3n/4 lies

h = class interval size of the class containing Q1 or Q3

f = frequency of the class containing Q1 or Q3

n = number of values, or the total frequency.

C.F = cumulative frequency of the class preceding the class containing Q1 or Q3


Quartile Deviation:

The quartile deviation is half the difference between the third quartile and the first quartile

of a frequency distribution, or simply distribution. Mathematically, quartile deviation would

be represented as follows:

Quartile deviation is also known as semi-interquartile range. Here, the difference between

the third and first quartiles is called interquartile range. The interquartile range may be

taken as measure of dispersion (i.e. the extent to which the values are spread out from the

average).

Prepared by Jameel .A. Ansari


Deciles:

The values which divide an array into ten equal parts are called deciles. The first, second,……

ninth decile d1-d2-d3

corresponds to median. The second, fourth, sixth and eighth deciles which collectively

divide the data into five equal parts are called quintiles.
Deciles for Ungrouped Data:

Deciles for ungrouped data will be calculated from the following formula:
𝒊
𝑫𝒊 = (𝑵 + 𝟏) × , 𝒊 = 𝟏, 𝟐, 𝟑 … … . . 𝟏𝟎
𝟏𝟎

Deciles for grouped data will be calculated from the following formula:
𝑵 𝒉
𝑫𝒊 = 𝒍 + (𝒊 × 𝟏𝟎 − 𝒄𝒇) × 𝒇 , 𝒊 = 𝟏, 𝟐, 𝟑 … … . . 𝟏𝟎

Percentiles:

The values which divide an array into one hundred equal parts are called percentiles. The

first, second,……. Ninety-ninth percentile are denoted by

The 50th percentile (p50) corresponds to the median. The 25th percentile

(p25)corresponds to the first quartile and the 75th percentile (p75)corresponds to the third

quartile.
Percentiles for Ungrouped Data:

Percentile from ungrouped data could be calculated from the following formula:

Prepared by Jameel .A. Ansari


Formula to calculate percentile
Percentiles for Grouped Data:

Percentiles can also be calculated for grouped data which is done with the help of following

formula:

Formula to find percentile for grouped data

Prepared by Jameel .A. Ansari


Measure of Dispersion:

Measures of dispersion are non-negative real numbers that help to gauge the spread of
data about a central value. These measures help to determine how stretched or squeezed
the given data is. There are five most commonly used measures of dispersion. These are
range, variance, standard deviation, mean deviation, and quartile deviation.

The most important use of measures of dispersion is that they help to get an understanding
of the distribution of data. As the data becomes more diverse, the value of the measure of
dispersion increases.

Measures of dispersion help to describe the variability in data. Dispersion is a statistical


term that can be used to describe the extent to which data is scattered. Thus, measures of
dispersion are certain types of measures that are used to quantify the dispersion of data.

The measures of dispersion can be classified into two broad categories. These are absolute
measures of dispersion and relative measures of dispersion. Range, variance, standard
deviation and mean deviation fall under the category of absolute measures of deviation.
These measures have the same unit as the data that is being scrutinized. Coefficients of
dispersion are relative measures of deviation. Such dispersion measures are always
dimensionless.

Prepared by Jameel .A. Ansari


Absolute Measures of Dispersion

If the dispersion of data within an experiment has to be determined then absolute


measures of dispersion should be used. These measures usually express variations in a
data set with respect to the average of the deviations of the observations. The most
commonly used absolute measures of deviation are listed below.

Range: Given a data set, the range can be defined as the difference between the maximum
value and the minimum value.
Variance: The average squared deviation from the mean of the given data set is known as
the variance. This measure of dispersion checks the spread of the data about the mean.
Standard Deviation: The square root of the variance gives the standard deviation. Thus,
the standard deviation also measures the variation of the data about the mean.
Mean Deviation: The mean deviation gives the average of the data's absolute deviation
about the central points. These central points could be the mean, median, or mode.

Prepared by Jameel .A. Ansari


Quartile Deviation: Quartile deviation can be defined as half of the difference between the
third quartile and the first quartile in a given data set.
Relative Measures of Dispersion

If the data of separate data sets have different units and need to be compared then relative
measures of dispersion are used. The measures are expressed in the form of ratios and
percentages thus, making them unitless. Some of the relative measures of dispersion are
given below:

Coefficient of Range: It is the ratio of the difference between the highest and lowest value
in a data set to the sum of the highest and lowest value.
Coefficient of Variation: It is the ratio of the standard deviation to the mean of the data
set. It is expressed in the form of a percentage.

Coefficient of Mean Deviation: This can be defined as the ratio of the mean deviation to
the value of the central point from which it is calculated.

Coefficient of Quartile Deviation: It is the ratio of the difference between the


third quartile and the first quartile to the sum of the third and first quartiles.
Measures of Dispersion Formula

Measures of dispersion are used when we want to find the scattering of data about a
central point such as the mean. The general formulas used to calculate the various
measures of dispersion are given in the tables below:

Absolute Measures of Dispersion

Absolute Measures of
Formulas
Dispersion

H-S
Range where H is the largest value and S is the
smallest value in a data set.

Variance 1) For ungrouped data

Prepared by Jameel .A. Ansari


∑(𝑥𝑖 −𝑥̅ )2
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 𝜎 2 = or
𝑁
2
2
∑ 𝑥𝑖 2 ∑ 𝑥𝑖
𝜎 = −( )
𝑁 𝑁

2)For Grouped data:


∑ 𝑓𝑖 (𝑥𝑖 − 𝑥̅ )2
2
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 𝜎 =
𝑁

1) For ungrouped data

∑(𝑥𝑖 −𝑥̅ )2
𝑆. 𝐷 = 𝜎 = √ or
𝑁

2
∑ 𝑥𝑖 2 ∑ 𝑥𝑖
Standard Deviation 𝜎=√ −( )
𝑁 𝑁

2)For Grouped data:

∑ 𝑓𝑖 (𝑥𝑖 − 𝑥̅ )2
𝑆. 𝐷 = 𝜎 = √
𝑁

1) For ungrouped data


∑|(𝑥𝑖 −𝑥̅ )|
𝑀. 𝐷 𝑓𝑟𝑜𝑚 𝑚𝑒𝑎𝑛 = 𝑁
Mean Deviation
2)For Grouped data:
∑ 𝑓𝑖 |(𝑥𝑖 − 𝑥̅ )|
𝑀. 𝐷 𝑓𝑟𝑜𝑚 𝑚𝑒𝑎𝑛 =
𝑁

1
Q.D = 2 (𝑄3 − 𝑄1 )
Quartile Deviation where Q3and Q1are the third and first
quartiles respectively.

Prepared by Jameel .A. Ansari


Relative Measures of Dispersion/ Coefficients of Dispersion:

Whenever we want compare the variability of the two series which differ widely in their
averages or which are measured in different units, we do not merely calculate the measure
of dispersion but we calculate the coefficients of dispersion (C.D) based on different
measures of dispersion are as follows:

Relative Measures of Dispersion Formulas

Coefficient of Range (H - S) / (H + S)

Coefficient of Variation (S.D. / Mean) * 100

Coefficient of Mean Deviation from mean Mean Deviation / Mean

(𝑄3 − 𝑄1 )
Coefficient of Quartile Deviation
(𝑄3 + 𝑄1 )

Measures of Dispersion and Central Tendency

Both measures of dispersion and measures of central tendency are used to describe data.
The table given below outlines the difference between the measures of dispersion and
central tendency.
Measures of Dispersion Central Tendency

When we want to quantify the Measures of central tendency


variability of data we use measures of help to quantify the data's
dispersion. average behavior.

Measures of dispersion include


Measures of central tendency are
variance, standard deviation, mean
mean, median, and mode.
deviation, quartile deviation, etc.

Prepared by Jameel .A. Ansari


Important Notes on Measures of Dispersion

• Measures of dispersion are used to determine the spread of data. They are
measured about a central value.
• Measures of dispersion can be classified into two types, i.e., absolute and relative
measures of dispersion.
• Absolute measures of deviation have the same units as the data and relative
measures are unitless.
• Range, variance, standard deviation, quartile deviation and mean deviation are
absolute measures of deviation
• Coefficients of dispersion are relative measures of deviation

Standard deviation (SD) is the most commonly used measure of dispersion. It is a


measure of spread of data about the mean. SD is the square root of sum of squared
deviation from the mean divided by the number of observations.

Prepared by Jameel .A. Ansari

You might also like