0% found this document useful (0 votes)
3 views

Chapter 1 Classification and Graphical Presentation [Becon 2025]

The document provides an introduction to statistics, covering key concepts such as statistical data, methods, and classifications. It distinguishes between descriptive and inferential statistics, outlines data collection methods, and explains types of data including univariate, bivariate, and multivariate. Additionally, it discusses scales of measurement, classification, and frequency distribution, emphasizing the importance of organizing and interpreting data effectively.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Chapter 1 Classification and Graphical Presentation [Becon 2025]

The document provides an introduction to statistics, covering key concepts such as statistical data, methods, and classifications. It distinguishes between descriptive and inferential statistics, outlines data collection methods, and explains types of data including univariate, bivariate, and multivariate. Additionally, it discusses scales of measurement, classification, and frequency distribution, emphasizing the importance of organizing and interpreting data effectively.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 67

STATISTICS I

CHAPTER 1: INTRODUCTION, CLASSIFICATION AND TABULATION

DHIRAJ GIRI
KATHMANDU UNIVERSITY
2025
The term Statistics is used to mean either ‘Statistical Data’ or
‘Statistical Methods’.

Statistical Data: refers to numerical descriptions of


quantitative aspects of things. These descriptions may take
the form of counts or measurements.

Statistical Methods: refers to collection, organization,


presentation, analysis and interpretation of statistical
(numerical) data.

Statistical method can be separate into two broad categories:


• Descriptive statistics and
• Inferential statistics
Descriptive Statistics:
• The descriptive statistics consists of those statistical methods, which
tell us how to describe the characteristics of a body of data.
• They deal with the collection, organization and presentation of data
and the calculation of measures, which describe the data in various
ways.

Inferential Statistics:
• Those methods of statistics, which allow us to make judgments,
prediction or decision about a large group of individuals when we
have only observed a sample of the total group.
Collection:
Data are collection of any number of related observations.
 Collection of data constitutes the first step in a statistical
investigation.
 Data may be obtained either from primary sources or the
secondary source.
 Primary Source is one that itself collects data.
 Secondary Source is one that makes available data,
which were collected by some other agencies.
 Depending on the source, statistical data are classified
under two categories.
 Primary Data: Data, which are collected for the first
time by investigator to fulfill the objective of study, are
called primary data.
 Primary data may be obtained by applying any of the
following method.
 Direct personal interview.
 Indirect oral interview.
 Information from correspondence.
 Mailed questionnaire method.
 Schedules sent through enumerators.
 Secondary Data: Data, which were not originally
collected but obtained from published or unpublished
source, are known as secondary data.
Univariate Data
• This type of data consists of only one variable.
• The analysis of univariate data is thus the simplest form of
analysis since the information deals with only one quantity
that changes.
• It does not deal with causes or relationships and the main
purpose of the analysis is to describe the data and find
patterns that exist within it. The example of a univariate
data can be height.
Bivariate Data
• This type of data involves two different variables.
• The analysis of this type of data deals with causes and
relationships and the analysis is done to find out the
relationship among the two variables.
• Example of bivariate data can be temperature and ice
cream sales in summer season. bivariate data analysis
involves comparisons, relationships, causes and
explanations.
Multivariate Data
• When the data involves three or more variables, it is
categorized under multivariate.

Time Series Data


• A time series is a sequence of information that attaches a
time period to each value.
• The value can be pretty much anything measurable that
depends on time in some way, like prices, humidity, or a
number of people.
• Time series data all time-periods must be equal and
clearly defined.
Cross-sectional Data
Data collected by observing many subjects (such as
individuals, firms or countries/regions) at the same point of
time, or without regard to differences in time.

Panel Data
Referred to as longitudinal data, is data that contains
observations about different cross sections across time.
BASIC DEFINITIONS
POPULATION:
• The collection of all items of interest in a particular study.
• Population is large in number, to study population
characteristics we have spend more time, effort and money.

SAMPLE:
• A set of data drawn from the population (subset of the
population available for observation)
• Parameter:
• Any statistical characteristic of a population.
Population mean, population median, population standard deviation, difference of two
population means are examples of parameters.

• Parameters describe the distribution of a population


• Parameters are usually unknown
• Usually denoted by Greek letter
Statistic:
• Any statistical characteristic of a sample.
Sample mean, sample median, sample standard deviation, sample proportion, odds ratio,
sample correlation coefficient are some examples of statistics.
• Statistic describes the distribution of sample
• Value of a statistic is known and varies for different samples
• Statistic are used for making inference on parameter
• Usually denoted by Latin letter
Variable
• Any characteristic that varies in amount or magnitude.
• If we observe a characteristic, we find that it takes on different values
in different persons, places, or things, we label the characteristic a
variable. heart rate, the heights of adult males, diastolic blood pressure, gender , blood
type , treatment effect, number of student, number of car
Discrete

Quantitative
variable

Continuous

Binary
variable
Variable
Multiple categorical
Qualitative variable
variable
Ordinal
variable
Quantitative Variable
• Characteristic that can quantify.
• Also known as metric, or numerical variable.
• Convey information regarding amount.
Number of children in family, Number of classroom, The weights of
preschool children, Diastolic blood pressure, Temperature, Rainfall,
Humidity
Qualitative Variable
• Characteristic that can not quantify.
• Also known as categorical or nominal.
• One that can not be measured in the
usual sense , only can be categorized.
• Convey information regarding attribute.
• Binary Variable: Gender, Live or Death, Yes or No.
• Multiple Categorical Variable
 Blood types: A, B, AB, O
 Ethnicity:

• Ordinal Variable: there is an order in the


categories.
 Your opinion on something: unsatisfactory, normal,
very satisfactory
ID Age Gender Educational level Occupation Height Weight
2025655 27 male graduate teacher 165 71.5
2025653 22 male undergraduate doctor 160 74

2025830 25 female junior high school worker 158 68

2022543 23 male senor high school students 161 69

2022466 25 female senor high school worker 159 62


2024535 27 female elementary farmer 157 68
2025834 20 male graduate cadre 158 66
2019464 24 male graduate students 158 70.5
Scales of Measurement

• Scales of measurement is how variables are defined


and categorized.

• Psychologist Stanley Stevens developed the four


common scales of Measurement:
Nominal, Ordinal, Interval and Ratio.

• Each scale of measurement has properties that


determine how to properly analyze the data.

• The properties evaluated


are Identity, Magnitude, Equal Intervals and
a Minimum Value of Zero.
Properties of Measurement
• Identity: Identity refers to each value having a unique
meaning.
• Magnitude: Magnitude means that the values have
an ordered relationship to one another, so there is a
specific order to the variables.
• Equal Intervals: Equal intervals mean that data
points along the scale are equal, so the difference
between data points one and two will be the same as
the difference between data points five and six.
• A Minimum Value of Zero: A minimum value of zero
means the scale has a true zero point. Degrees, for
example, can fall below zero and still have meaning.
But if you weigh nothing, you don’t exist.
1. Nominal Scale of Measurement
The Nominal Scale of Measurement defines the
identity property of data. This scale has certain
characteristics, but doesn’t have any form of numerical
meaning. The data can be placed into categories but
can’t be multiplied, divided, added or subtracted from
one another. It’s also not possible to measure the
difference between data points.
Examples of nominal data include Name, Caste and
Ethnicity, Marital Status of Person, Eye Color, Country
of Birth. Nominal data can be broken down again into
three categories:
• Nominal with order: Some nominal data can be
sub-categorised in order, such as “Cold, Warm, Hot
and Very hot.”
• Nominal without order: Nominal data can also be
sub-categorised as nominal without order, such as
male and female.
2. Ordinal Scale of Measurement
• The ordinal scale contains qualitative data; ‘ordinal’ meaning
‘order’. It places variables in order/rank, only permitting to
measure the value as higher or lower in scale. The scale
cannot generate a precise comparison between the two
categories.
• The Ordinal Scale defines data that is placed in a specific
order. While each value is ranked, there’s no information that
specifies what differentiates the categories from each other.
• These values can’t be added to or subtracted from.
• An example of this kind of data would include satisfaction data
points in a survey, where ‘one = happy, two = neutral and three
= unhappy.’ Where someone finished in a race also describes
ordinal data. While first place, second place or third place
shows what order the runners finished in, it doesn’t specify how
far the first-place finisher was in front of the second-place
finisher.
types.
Familiarity:
The familiarity ordinal scale can help you assess the level of
knowledge your respondents have about the topic.
Very Familiar, Quite Familiar, Moderately Familiar,
Somewhat Familiar, Not at all Familiar
Agreement:
This scale can help determine how much your respondents
agree/disagree with your statement.
Strongly Agree, Agree, Neutral, Disagree, Strongly
Disagree
Frequency:
This ordinal scale can inform you how often an activity is performed to
help you evaluate the behavior pattern.
Always, Often, Sometimes, Rarely, Never
Satisfaction:
The best way to understand how satisfied your customers, employees,
and prospects are with your services and products.
Very Satisfied, Satisfied, Neutral, Dissatisfied, Very
3. Interval Scale of Measurement
• The interval scale contains properties of nominal and
ordered data, but the difference between data points can
be quantified. This type of data shows both the order of
the variables and the exact differences between the
variables.
• They can be added to or subtracted from each other, but
not multiplied or divided. For example, 40 degrees is not
20 degrees multiplied by two.
• This scale is also characterized by the fact that the
number zero is an existing variable. In the ordinal scale,
zero means that the data does not exist. In the interval
scale, zero has meaning – for example, if you measure
degrees, zero has a temperature.
• Data points on the interval scale have the same
difference between them. The difference on the scale
between 10 and 20 degrees is the same between 20 and
30 degrees. This scale is used to quantify the difference
between variables, whereas the other two scales are
4. Ratio Scale of Measurement
Ratio Scales of measurement include properties from all
four scales of measurement. The data is nominal and
defined by an identity, can be classified in order, contains
intervals and can be broken down into exact value.
Weight, height and distance are all examples of ratio
variables. Data in the ratio scale can be added,
subtracted, divided and multiplied.

Ratio scales also differ from interval scales in that the


scale has a ‘True Zero’. The number zero means that the
data has no value point. An example of this is height or
weight, as someone cannot be zero centimeters tall or
weigh zero kilos – or be negative centimeters or negative
kilos. Examples of the use of this scale are calculating
shares or sales. Of all types of data on the scales of
measurement, data scientists can do the most with ratio
Classification and Tabulation
• Data obtained from primary sources are in a raw in state.
• Unorganized and shapeless masses of collected data are not
capable of being rapidly interpreted.
• Collected data must be edited so that omission,
inconsistence, irrelevant answer and wrong computations are
corrected or adjusted.
• In order to make data easily understandable, the first task is
to condense and simplify them.
• The procedure that is adopted for this purpose is known as
Classification.
o Classification is the grouping of related facts into class.
o Classification is the sorting out of a heterogeneous mass
of data into a number of homogeneous groups.
Objective of Classification
• To condense the mass of data.
• To pin point the most significant features of the data. (For
example idea about range over which the items are spread and also the shape of the
distribution i.e. trends, what values appeared most often and what values the data may tend
to group around etc.)
• To facilitate the comparison
• To enable mathematical treatment.
The Data can be classified on the following four bases.
• Geographical: On the basis of geographical location.
• Chronological: On the basis of time.
• Qualitative: According to some attribute, which cannot
measure. (which are not capable of being described numerically) such as sex,
color of hair, literacy and religion etc. When classification is to be made on the
basis of attributes, groups are differentiated either by the presence or absence of
the attribute or by its differing qualities.
• Quantitative: According to some characteristics that can be
measured. (numerically described) such as height, weight, income, sales,
profit, production etc.
On the Basis of Geographical Location On the Basis of Time
Region Number of school registered Year 2061 2062 2063 2064
Population in million 16 18 20 23
Eastern 2404
Central 1518
Western 1827
Mid - western 772
Far western 53

According to some Characteristics that


According to some Attribute
can be Measured
Employee No. of Children No. of families
Skilled
Unskilled 0 15
Fem Fem 1 20
Male Male
ale ale 2 22
3 16
4 7
Frequency Distribution
• A frequency distribution is a tabular presentation of data, having two
different columns.
• The very first column that is the left column (called classes or groups)
includes numerical values/intervals on a variable being studied.
• The right column is a list of the frequencies (the number of times a
certain value of the variables repeated in a given set of data), or
number of observations, for each item/class.
• It is a actual observed distribution.

Distance Travelled Frequency


(in miles)
5 24
6 16
8 10
10 8
Three Types of Frequency Distributions
 Categorical frequency distributions - can be used for data
that can be placed in specific categories, such as nominal- or
ordinal-level data.
Examples - political affiliation, religious affiliation, blood type
etc.
 Ungrouped frequency distributions - can be used for data
that can be enumerated and when the range of values in the
data set is not large.
Examples - number of miles your instructors have to travel
from home to campus, number of girls in a 4-child family etc.
 Grouped frequency distributions - can be used when the
range of values in the data set is very large. The data must be
grouped into classes that are more than one unit in width.
 Frequency distribution is a presentation of a number of observations
of an attribute or values of a variable arranged according to their
magnitudes either individually in the case of discrete series or in a
range (i.e. class interval) in the case of both discrete and continuous
series.
Rating Frequency X Frequency (f) Class Frequency (f)
Poor 2 2 1 10 - 20 1
Below Average 3 5 9 20 - 30 9
Average 5 8 5 30 - 40 5
Above Average 9 12 3 40 - 50 3
Excellent 1 18 3 50 - 60 3
Construction of Discrete Frequency Distribution
1.Prepare three columns one for variable, one for Tally Bars and the
third for the frequency corresponding to the size or the value of
variables.
2.In the first column, place all possible values of the variable from the
lowest to the highest.
3.In a second column, put a bar (vertical line) opposite the particular
value to which it relates
4.In the frequency column, frequency as counted with the help of bars,
is placed opposite the value or size of the variable
Marks Obtained by Tally Number of Students
students (x) Bars Frequency (f)
10 II 2
15.5 III 3
20 II 2
22.5 IIII 4
30 III 3
45 II 2
48 II 2
Total 18
Grouped Frequency Distribution
• Classifying the data into different classes by dividing the entire range
of the values of the variable into a suitable number of groups called
classes and recording the number of observations in each group (or
class).
• In grouped frequency distribution data are divided in to similar
categories or classes and then count the number of observations that
fall into each category.
• It is based on the assumption that, frequencies within each class are
spread evenly over the range of the class-intervals. There will be as
many items below the mid–point as above it.
Types of Group Frequency
Marks in Number of
Distribution Statistics Tally bars students
There are two methods of classifying (Class) (f)
the data according to class intervals, 0–5 |||| 5
5 – 10 |||| |||| 9
namely 10 –15 |||| |||| ||| 13
(i) Inclusive Type and 15 – 20 |||| || 7
(ii) Exclusive Type 20 – 25 |||| ||| 8
25 – 30 ||| 3
Total 45
Inclusive Type of Group Frequency Distribution
• The classes of this type are such that both the lower and upper
limits of a particular class are included in the same class.
Room 10-19 20-29 30-39 40-49
No. of Hotels 5 10 2 3

Exclusive Type of Group Frequency Distribution


• The classes of this type does not include upper class limit.
• The class intervals are so arranged that the upper limit of one
class is the lower limit of the next class so that there is no gap in
the classes.
Marks Obtained 10-20 20-30 30-40 40-50
No. of Students 10 30 2 5
Terms of Group Frequency Distribution
• Class Limits: The class limits are range value which may or
may not included in class. The lower range value is lower limit
and upper range value is upper limit of the class.
• Class Frequency: The number of observation corresponding to
the particular class is known as the frequency of that class or
class frequency.
• Class Interval: The difference between the upper and lower
limits of a class is known as the length of class interval
• Class Mid-Point/ Class Mark: It is a arithmetic mean of two
limit of the class. Upper limit of the class  Lower limit of a class
Mid point of a class 
2
• Cumulative Frequency: Frequency added to more than one
class
• Relative Frequency: Frequency of class relate to total
frequency is relative frequency. It is a ratio of class frequency to
total frequency. Re lative Frequency  Frequency of Class
Total Freq uency
Cumulative Frequency:
• When frequencies of two or more classes are added up, such a total are
called cumulative frequencies.
• When If we are interested to know the total number of observation getting
a value "less than" or "more than" a particular value of variable, then one
must consider the cumulative frequency distribution.
 Less than Cumulative Frequency: When frequencies of two or more
classes are added up from the top of the table provided that the
values (classes) are arranged in an ascending order of magnitude
taken upper limit of each class, resulting frequency is more than
cumulative frequency
 Less than cumulative frequency for any value of the variable (or class)
is obtained on adding successively the frequencies of all the previous
values (or classes) including the frequency of value (class) against
which the totals are written, provided that the values (classes) are
arranged in an ascending order of magnitude taken upper limit of
each class.
 It reflect the total number of observation at the end of that particular
class.
 Less than cumulative frequency is associated with upper limit of
class.
 More than Cumulative Frequency: When frequencies of two or
more classes are added up from the bottom of the table provided that
the values (classes) are arranged in an ascending order of magnitude
taken upper limit of each class, resulting frequency is more than
cumulative frequency
 More than cumulative frequency is associated with lower limit of the
class. Marks Frequency Less than More than
(x) (f) cf cf
5 - 10 3 3 27 + 3 = 30
10 - 15 4 3+4=7 23 + 4 = 27
15 - 20 7 7 + 7 = 14 16 + 7 = 23
20 - 25 4 14 + 4 = 18 12 + 4 = 16
25 - 30 4 18 + 4 = 22 8 + 4 =12
30 - 35 3 22 + 3 = 25 5+3=8
35 - 40 3 25 + 3 = 28 2+3=5
40 - 45 1 28 + 1 = 29 1+1=2
45 - 50 1 29 + 1 = 30 1
Total 30
• Open End Classes: If lower limit of first class or upper limit
of last class, or both is not specified then it is called open end
classes.
Marks Below 30 30-40 40-50 Above 50
No. of Students 10 5 7 3
• Correction Factor: For changing inclusive type class interval
into exclusive type class interval, a correction factor is used.
Lower limit of second class - Upper limit of first class
Correction factor 
2
Then , new exclusive class as
Lower class boundary = lower limit inclusive class – correction factor value
Upper limit = upper limit in inclusive class + correction factor value
Struggles’ Rule: By using Struggles’ rule we can find number of
suitable classes. According to Struggles’ rule suitable
Number of classes (K) = 1 + 3.322 log10N,
where, N = Number of observation.
Points to Remember while Constructing Group Frequency Distribution
• The classes should be clearly defined.
• The classes should be exhaustive, i.e. each of the given values should
be included in one of the classes.
• The class should be mutually exclusive (no data point falls into more
than one category) and non-overlapping.
• The class should be of equal width.
• The open-end classes, less than ‘a’ or grater than ‘b’ should be
avoided as far as possible.
• The number of class should be neither too large nor too small.

We can find suitable length of class interval


Next to largest observation - Smallest observation
Suitable length 
Required number of classes
Essential Parts of Statistical Table
In general, a statistical table consists of the following eight parts.
i) Table Number: Identifies the table for reference.
ii) Title: It indicates scope and nature of contents in concise form.
iii) Caption: They are heading and subheading of columns.
iv) Stubs: They are heading and subheading in rows.
v) Body: It contains numerical information.
vi) Head note: The head-note (or prefatory note) contains the unit of
measurement of data. It is usually placed just below the title or at the
right hand top corner of the table.
vii) Foot note: A foot note is given at the bottom of a table. It helps in
clarifying the point which is not clear in the table. A foot note may be
keyed to the title or to any column or to any row heading. It is
identified by symbols such as *,+,@,£ etc.
viii)Source note: It indicates the source from which data taken.
Parts of Statistical Table
Title
Head notes (if any)
Table number
Captions Row Total
BODY
Stubs

Column Total Grand Total

Source notes: Foot notes:


More than
Less than cumulative
Marks in Number of
cumulative frequency Relative
Statistics Tally bars students
frequency (cf) frequency
(Class) (f)
(cf)
0–5 |||| 5 5 45 5/45
5 – 10 |||| |||| 9 14 40 9/45
10 –15 |||| |||| ||| 13 27 31 13/45
15 – 20 |||| || 7 34 18 7/45
20 – 25 |||| ||| 8 42 11 8/45
25 – 30 ||| 3 45 3 3/45
Total 45
Stem and Leaf Display
• Histogram like shaped display
• Used to find the shape of distribution without loosing
information on value of variable

Raw Data Stem Leaf

86 77 91 60 55 2 3
76 92 47 88 67 3 9
4 79
23 59 72 75 83
5 569
77 68 82 97 89 6 07788
81 75 74 39 67 7 0245567789
8 11233689
79 83 70 78 91
9 11247
68 49 56 94 81
• Make vertical list of the stem. (e.g. for two digits data points, the
stem are digit in 10th place)
• Draw a vertical line to the right of the stem.
• List leaves (e.g. for two digits data points, the leaves are digit in 1st
place)
• Make an ascending order of leaves.

Raw Data Stem Leaf

86 77 91 60 55 2 3
Stem 3 9
76 92 47 88 67
4 79
23 59 72 75 83
5 569
77 68 82 97
Leaf
89 6 07788
81 75 74 39 67 7 0245567789

79 83 70
Stem
78 91
8 11233689
9 11247
68 49 56Leaf
94 81
Graphical Presentation of Data
• Statistical results may be presented through diagram and
graphs.
• Geometrical representations of frequency distributions are
more popular than their counterparts as:
 The information presented is easily understood.
 They give a bird’s eye view of the entire data and
information presented is easily understood.
 They are attractive to eye and they have a great
memorizing effect.
 The impression created by it last much longer those
created by figures presented in a tabular form.
 They simplify complexity and facilitate comparison of
data.
 They will enable us to estimate some value at a glance.
• The commonly used graphs for representing the frequency
distribution are:
 Bar diagram (Simple, Sub-divided, Multiple, Percentage)
 Pie chart
 Histogram
 Frequency Polygon
 Frequency Curve
 Cumulative Frequency Curve or ‘Ogive’.
Bar Diagram
• Bar diagram is most common type of diagram used in practice.
• A bar diagram is used to represented only one
variable.
• A bar diagram can present only one classification or
category of data.
• In a bar diagram only the length of the bar that matters but not the
width.
• The lengths of the bar are in proportion to the different figure they
represent.
• The width of the bars and theNumber
gapofbetween twoUniversity
Students in Different bars are uniform
throughout the diagram. 600
Number of students

500
500 450
400
400 350

300

200
100
100

0
A B C D E

University
Sub-divided Bar Diagram
• In subdivided bar diagram each bar represents the
magnitude of a given phenomenon and is further divided
in its various components.
• Each component occupies a part of the bar proportional to
its share in the total.
Expenditure on Various Items by Family Joint Stock Companies
2000

Joint stock company


14000 1800
1600
12000
1400
10000 1200
1000
8000 Private
800
Public
6000 600
4000 400
200
2000 0
0
1996 1997 1998 1999 2000
A B
Year
Food Clothing Education Rent Fuel Misc.
Multiple Bar Diagrams
• In a multiple bar diagram two or more then two sets of
interrelated data are represented
Number of Companies

1400
Number of companies

1200
1000
800 Public
600 Private
400
200
0
1996 1997 1998 1999 2000
Year
Percentage Bar Diagrams
• In percentage bar diagram the length of the bar is kept
equal to 100 and segment are cut in these bar to represent
the components (percentage) of an aggregative.
Expenditure on Various Items by Family
100%

90%

80%

70%

60%

50%

40%

30%

20%

10%

0%
A B

Food Clothing Education Rent Fuel Misc.


Pie Diagram
• Pie diagrams are used to show percentage breakdowns of
categorical data.
• The entire diagram looks like pie, and the components
resemble slice cut from pie.
• In a pie diagram, sectors of a circle are used to represent
the components of the total.
• The area of each sector is proportional to the component
part of the total the sector represents.
• It is very useful for displaying nominal or ordinal categories
of data. Monthly Expenditure of a Family

5%
5%

10%
Food
40% Rent
Clothing
Education
20% Litigation
Other

20%
How to Construct a Pie Chart
• In a pie chart the total value is represented by 3600
• The first step is to convert the various component values into
corresponding degree on the circle.
Value of component part
Angle of sector  360 0
Total Value
• The second step is to draw a circle of appropriate size with a
compass.
• The third step is to measure points on the circle representing the
size of each sector with the help of protractor.
• Usually sectors are arranged according to size with the largest at
the top and others in sequence running clockwise
Histogram
• A histogram shows continuous data in ordered rectangular
columns without any gaps between the columns.
• The histogram displays a shape of a data set.
• A histogram is a bar graph that consists of vertical bars
constructed on a horizontal line that is marked off with
intervals for the variable being displayed. The intervals
correspond to those in a frequency distribution table. The
height of each bar is proportional to the number of
observations in that interval.
• The area of each bar represents the respective class
frequencies.
• When the class–intervals are equal, all the bars have same width
and their heights directly represent the class frequencies.
• The height of the bar must be proportionally decreases if the length
of the corresponding class interval increases.
• If the class intervals are not all equal, then plot frequency density
against the class.
His togram : Daily High Te m pe rature
7 6
Interval Frequency 6 5
10 - 20 3 5 4
Frequency

20 - 30 6
30 - 40 5 4 3
40 - 50 4
50 - 60 2 3 2
2
1 0 0
(No gaps 0
between 0 0 10 10 20 2030 30
40 4050 60 70
bars) 50 60
Temperature in Degrees
Frequency Polygon:
• A frequency polygon is a graph constructed by using lines
to join the midpoints of each interval.
• The heights of the points represent the frequencies.
• It uses a line graph to represent quantitative data and
depict the shape and trends of the data
• Frequency polygon can be drawn in two ways:
i) With drawing a histogram and
ii) Without drawing a histogram
With Drawing a Histogram
• First draw a histogram of the given data.
• Find mid-points of upper horizontal side of each rectangle
and join these by a straight lines
• Consider two hypothetical class with zero frequency on
either side of vertical bar of histogram.
• Find the mid points of these two hypothetical classes with
zero frequency.
• Close the polygon at both ends of distribution by
extending them to the base line.
Without Drawing a Histogram
• Mark the class intervals for each class on the horizontal axis.
• Mark all the class marks on the horizontal axis.
• Corresponding to each class mark, plot the respective frequency
• The frequency is plotted against the class mark and not the upper or
lower limit of any class.
• Join all the plotted points using a line segment. The curve obtained
will be kinked.
• Consider two hypothetical class with zero frequency on either side
of vertical bar of histogram.
• Find the mid points of these two hypothetical classes with zero
frequency.
• Close the polygon at both ends of distribution by extending them to
the base line.
Cumulative Frequency Curve “Ogive”:
• Cumulative frequency curve‘ Ogive’ is a graphical
presentation of the cumulative frequency distribution of
continuous variable.
• It allows us to quickly estimate the number of observations
that are less than or equal to a particular value.

There are two types of ogives:


• Less than Ogive
• More than Ogive.
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
The Cumulative Frequency Distribution
Less than More than
Class Frequency cf cf
10-20 3 3 20
20-30 6 9 17
30-40 5 14 11
40-50 4 18 6
50-60 2 20 2
Total 20

Less than ogive More than ogive


20 20
18 18
16 16
14 14
12 12
10 10
8 8
6 6
4 4
2 2
0 0
15 20 25 30 35 40 45 50 55 60 65 5 10 15 20 25 30 35 40 45 50 55
Less than Cumulative Frequency Curve
• Less than cumulative frequencies are plotted
along Y–axis against the upper boundaries of
the respective classes along X–axis. (Plot the points
(x,y) using upper limits (x) and their corresponding Cumulative frequency (y))
• The points so obtained are joined by a
smoothed curve.
• The less than ogive curve looks like an
Dataelongated
in ordered array: S. slopes rising from left to right.
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Less than More than 20

Class Frequency cf cf 18
16
10-20 3 3 20 14
20-30 6 9 17 12

30-40 5 14 11 10

40-50 4 18 6 8
6
50-60 2 20 2 4
Total 20 2
0
15 20 25 30 35 40 45 50 55 60 65
More than Cumulative Frequency Curve
• More than cumulative frequencies are plotted
along Y–axis against the lower boundaries of
the respective classes along X–axis. (Plot the points
(x,y) using lower limits (x) and their corresponding Cumulative frequency (y))
• The points so obtained are joined by a
smoothed curve.
• The more than ogive curve slopes down from
Data left to array:
in ordered right.
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
20
Less than More than 18
Class Frequency cf cf 16

10-20 3 3 20 14
12
20-30 6 9 17 10
30-40 5 14 11 8

40-50 4 18 6 6
4
50-60 2 20 2 2

Total 20 0
5 10 15 20 25 30 35 40 45 50 55
Shape of the Distribution
• The shape of the distribution is said to be symmetric if the
observations are balanced, or evenly distributed, about the
center.
Symmetric Distribution

10
9
8
7
Frequency

6
5
4
3
2
1
0
1 2 3 4 5 6 7 8 9
Skewed
• The shape of the distribution is said to be skewed if the
observations are not symmetrically distributed around the
center.
Positively Skewed Distribution
A positively skewed distribution 12

(skewed to the right) has a tail 10

Frequency
that extends to the right in the 6

direction of positive values.


4

0
1 2 3 4 5 6 7 8 9

A negatively skewed distribution Negatively Skewed Distribution

(skewed to the left) has a tail that 12

10

extends to the left in the direction 8

Frequency
6

of negative values. 4

0
1 2 3 4 5 6 7 8 9
Stem-and-Leaf Diagram
• A simple way to see distribution details in a data set
• Separate the sorted data series into leading digits (the
stem) and the trailing digits (the leaves)
Data in ordered array:
21, 24, 24, 26, 27, 27, 30, 32, 38, 41
• Here, use the 10’s digit for the stem unit:
Stem Leaf
 21 is shown as 2 1 Stem Leaves
 38 is shown as 3 8 2 1 4 4 6 7 7
3 0 2 8
• Completed stem-and-leaf diagram:
4 1
Using Other Stem Units
• Using the 100’s digit as the stem:
• The completed stem-and-leaf display:
Stem Leaves
6 13 32
Data: 7 17 22 50
613, 632, 717, 722, 750, 827, 8 27 41 59 63 91
841, 859, 863, 891, 906, 928,
9 06 28 33 55
933, 955, 1034, 1047,1056,
1140, 1169, 1224 10 34 47 56
11 40 69
12 24
Relationships Between Variables
• Graphs illustrated so far have involved only a single variable
• When two variables exist other techniques are used:

Categorical Numerical
(Qualitative) (Quantitative)
Variables Variables

Cross tables Scatter plots


Dot Plot
Dot plots are very useful when the variable is quantitative or categorical.
Dot plot or dot graph is just one of the many types of graphs and charts
to organize statistical data. It uses dots to represent data. A Dot Plot is
used for relatively small sets of data and the values fall into a number of
discrete categories. If a value appears more than one time, the dots are
ordered one above the other. That way the column height of dots shows
the frequency for that value.
Line Graph
A line chart graphically displays data that changes continuously over
time. Each line graph consists of points that connect data to show a
trend (continuous change). Line graphs have an x-axis and a y-axis. In the
most cases, time is distributed on the horizontal axis.
Scatter Plot
The scatter plot is an X-Y diagram that shows a relationship between two
variables (paired numerical data). It is used to plot data points on a
vertical and a horizontal axis. The purpose is to show how much one
variable affects another. Scatter plots also help you predict the behavior
of one variable (dependent) based on the measure of the other variable
(independent).

You might also like