Chapter 1 Classification and Graphical Presentation [Becon 2025]
Chapter 1 Classification and Graphical Presentation [Becon 2025]
DHIRAJ GIRI
KATHMANDU UNIVERSITY
2025
The term Statistics is used to mean either ‘Statistical Data’ or
‘Statistical Methods’.
Inferential Statistics:
• Those methods of statistics, which allow us to make judgments,
prediction or decision about a large group of individuals when we
have only observed a sample of the total group.
Collection:
Data are collection of any number of related observations.
Collection of data constitutes the first step in a statistical
investigation.
Data may be obtained either from primary sources or the
secondary source.
Primary Source is one that itself collects data.
Secondary Source is one that makes available data,
which were collected by some other agencies.
Depending on the source, statistical data are classified
under two categories.
Primary Data: Data, which are collected for the first
time by investigator to fulfill the objective of study, are
called primary data.
Primary data may be obtained by applying any of the
following method.
Direct personal interview.
Indirect oral interview.
Information from correspondence.
Mailed questionnaire method.
Schedules sent through enumerators.
Secondary Data: Data, which were not originally
collected but obtained from published or unpublished
source, are known as secondary data.
Univariate Data
• This type of data consists of only one variable.
• The analysis of univariate data is thus the simplest form of
analysis since the information deals with only one quantity
that changes.
• It does not deal with causes or relationships and the main
purpose of the analysis is to describe the data and find
patterns that exist within it. The example of a univariate
data can be height.
Bivariate Data
• This type of data involves two different variables.
• The analysis of this type of data deals with causes and
relationships and the analysis is done to find out the
relationship among the two variables.
• Example of bivariate data can be temperature and ice
cream sales in summer season. bivariate data analysis
involves comparisons, relationships, causes and
explanations.
Multivariate Data
• When the data involves three or more variables, it is
categorized under multivariate.
Panel Data
Referred to as longitudinal data, is data that contains
observations about different cross sections across time.
BASIC DEFINITIONS
POPULATION:
• The collection of all items of interest in a particular study.
• Population is large in number, to study population
characteristics we have spend more time, effort and money.
SAMPLE:
• A set of data drawn from the population (subset of the
population available for observation)
• Parameter:
• Any statistical characteristic of a population.
Population mean, population median, population standard deviation, difference of two
population means are examples of parameters.
Quantitative
variable
Continuous
Binary
variable
Variable
Multiple categorical
Qualitative variable
variable
Ordinal
variable
Quantitative Variable
• Characteristic that can quantify.
• Also known as metric, or numerical variable.
• Convey information regarding amount.
Number of children in family, Number of classroom, The weights of
preschool children, Diastolic blood pressure, Temperature, Rainfall,
Humidity
Qualitative Variable
• Characteristic that can not quantify.
• Also known as categorical or nominal.
• One that can not be measured in the
usual sense , only can be categorized.
• Convey information regarding attribute.
• Binary Variable: Gender, Live or Death, Yes or No.
• Multiple Categorical Variable
Blood types: A, B, AB, O
Ethnicity:
86 77 91 60 55 2 3
76 92 47 88 67 3 9
4 79
23 59 72 75 83
5 569
77 68 82 97 89 6 07788
81 75 74 39 67 7 0245567789
8 11233689
79 83 70 78 91
9 11247
68 49 56 94 81
• Make vertical list of the stem. (e.g. for two digits data points, the
stem are digit in 10th place)
• Draw a vertical line to the right of the stem.
• List leaves (e.g. for two digits data points, the leaves are digit in 1st
place)
• Make an ascending order of leaves.
86 77 91 60 55 2 3
Stem 3 9
76 92 47 88 67
4 79
23 59 72 75 83
5 569
77 68 82 97
Leaf
89 6 07788
81 75 74 39 67 7 0245567789
79 83 70
Stem
78 91
8 11233689
9 11247
68 49 56Leaf
94 81
Graphical Presentation of Data
• Statistical results may be presented through diagram and
graphs.
• Geometrical representations of frequency distributions are
more popular than their counterparts as:
The information presented is easily understood.
They give a bird’s eye view of the entire data and
information presented is easily understood.
They are attractive to eye and they have a great
memorizing effect.
The impression created by it last much longer those
created by figures presented in a tabular form.
They simplify complexity and facilitate comparison of
data.
They will enable us to estimate some value at a glance.
• The commonly used graphs for representing the frequency
distribution are:
Bar diagram (Simple, Sub-divided, Multiple, Percentage)
Pie chart
Histogram
Frequency Polygon
Frequency Curve
Cumulative Frequency Curve or ‘Ogive’.
Bar Diagram
• Bar diagram is most common type of diagram used in practice.
• A bar diagram is used to represented only one
variable.
• A bar diagram can present only one classification or
category of data.
• In a bar diagram only the length of the bar that matters but not the
width.
• The lengths of the bar are in proportion to the different figure they
represent.
• The width of the bars and theNumber
gapofbetween twoUniversity
Students in Different bars are uniform
throughout the diagram. 600
Number of students
500
500 450
400
400 350
300
200
100
100
0
A B C D E
University
Sub-divided Bar Diagram
• In subdivided bar diagram each bar represents the
magnitude of a given phenomenon and is further divided
in its various components.
• Each component occupies a part of the bar proportional to
its share in the total.
Expenditure on Various Items by Family Joint Stock Companies
2000
1400
Number of companies
1200
1000
800 Public
600 Private
400
200
0
1996 1997 1998 1999 2000
Year
Percentage Bar Diagrams
• In percentage bar diagram the length of the bar is kept
equal to 100 and segment are cut in these bar to represent
the components (percentage) of an aggregative.
Expenditure on Various Items by Family
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
A B
5%
5%
10%
Food
40% Rent
Clothing
Education
20% Litigation
Other
20%
How to Construct a Pie Chart
• In a pie chart the total value is represented by 3600
• The first step is to convert the various component values into
corresponding degree on the circle.
Value of component part
Angle of sector 360 0
Total Value
• The second step is to draw a circle of appropriate size with a
compass.
• The third step is to measure points on the circle representing the
size of each sector with the help of protractor.
• Usually sectors are arranged according to size with the largest at
the top and others in sequence running clockwise
Histogram
• A histogram shows continuous data in ordered rectangular
columns without any gaps between the columns.
• The histogram displays a shape of a data set.
• A histogram is a bar graph that consists of vertical bars
constructed on a horizontal line that is marked off with
intervals for the variable being displayed. The intervals
correspond to those in a frequency distribution table. The
height of each bar is proportional to the number of
observations in that interval.
• The area of each bar represents the respective class
frequencies.
• When the class–intervals are equal, all the bars have same width
and their heights directly represent the class frequencies.
• The height of the bar must be proportionally decreases if the length
of the corresponding class interval increases.
• If the class intervals are not all equal, then plot frequency density
against the class.
His togram : Daily High Te m pe rature
7 6
Interval Frequency 6 5
10 - 20 3 5 4
Frequency
20 - 30 6
30 - 40 5 4 3
40 - 50 4
50 - 60 2 3 2
2
1 0 0
(No gaps 0
between 0 0 10 10 20 2030 30
40 4050 60 70
bars) 50 60
Temperature in Degrees
Frequency Polygon:
• A frequency polygon is a graph constructed by using lines
to join the midpoints of each interval.
• The heights of the points represent the frequencies.
• It uses a line graph to represent quantitative data and
depict the shape and trends of the data
• Frequency polygon can be drawn in two ways:
i) With drawing a histogram and
ii) Without drawing a histogram
With Drawing a Histogram
• First draw a histogram of the given data.
• Find mid-points of upper horizontal side of each rectangle
and join these by a straight lines
• Consider two hypothetical class with zero frequency on
either side of vertical bar of histogram.
• Find the mid points of these two hypothetical classes with
zero frequency.
• Close the polygon at both ends of distribution by
extending them to the base line.
Without Drawing a Histogram
• Mark the class intervals for each class on the horizontal axis.
• Mark all the class marks on the horizontal axis.
• Corresponding to each class mark, plot the respective frequency
• The frequency is plotted against the class mark and not the upper or
lower limit of any class.
• Join all the plotted points using a line segment. The curve obtained
will be kinked.
• Consider two hypothetical class with zero frequency on either side
of vertical bar of histogram.
• Find the mid points of these two hypothetical classes with zero
frequency.
• Close the polygon at both ends of distribution by extending them to
the base line.
Cumulative Frequency Curve “Ogive”:
• Cumulative frequency curve‘ Ogive’ is a graphical
presentation of the cumulative frequency distribution of
continuous variable.
• It allows us to quickly estimate the number of observations
that are less than or equal to a particular value.
Class Frequency cf cf 18
16
10-20 3 3 20 14
20-30 6 9 17 12
30-40 5 14 11 10
40-50 4 18 6 8
6
50-60 2 20 2 4
Total 20 2
0
15 20 25 30 35 40 45 50 55 60 65
More than Cumulative Frequency Curve
• More than cumulative frequencies are plotted
along Y–axis against the lower boundaries of
the respective classes along X–axis. (Plot the points
(x,y) using lower limits (x) and their corresponding Cumulative frequency (y))
• The points so obtained are joined by a
smoothed curve.
• The more than ogive curve slopes down from
Data left to array:
in ordered right.
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
20
Less than More than 18
Class Frequency cf cf 16
10-20 3 3 20 14
12
20-30 6 9 17 10
30-40 5 14 11 8
40-50 4 18 6 6
4
50-60 2 20 2 2
Total 20 0
5 10 15 20 25 30 35 40 45 50 55
Shape of the Distribution
• The shape of the distribution is said to be symmetric if the
observations are balanced, or evenly distributed, about the
center.
Symmetric Distribution
10
9
8
7
Frequency
6
5
4
3
2
1
0
1 2 3 4 5 6 7 8 9
Skewed
• The shape of the distribution is said to be skewed if the
observations are not symmetrically distributed around the
center.
Positively Skewed Distribution
A positively skewed distribution 12
Frequency
that extends to the right in the 6
0
1 2 3 4 5 6 7 8 9
10
Frequency
6
of negative values. 4
0
1 2 3 4 5 6 7 8 9
Stem-and-Leaf Diagram
• A simple way to see distribution details in a data set
• Separate the sorted data series into leading digits (the
stem) and the trailing digits (the leaves)
Data in ordered array:
21, 24, 24, 26, 27, 27, 30, 32, 38, 41
• Here, use the 10’s digit for the stem unit:
Stem Leaf
21 is shown as 2 1 Stem Leaves
38 is shown as 3 8 2 1 4 4 6 7 7
3 0 2 8
• Completed stem-and-leaf diagram:
4 1
Using Other Stem Units
• Using the 100’s digit as the stem:
• The completed stem-and-leaf display:
Stem Leaves
6 13 32
Data: 7 17 22 50
613, 632, 717, 722, 750, 827, 8 27 41 59 63 91
841, 859, 863, 891, 906, 928,
9 06 28 33 55
933, 955, 1034, 1047,1056,
1140, 1169, 1224 10 34 47 56
11 40 69
12 24
Relationships Between Variables
• Graphs illustrated so far have involved only a single variable
• When two variables exist other techniques are used:
Categorical Numerical
(Qualitative) (Quantitative)
Variables Variables