0% found this document useful (0 votes)
275 views

Frequency Distribution - Data Management PDF

The document discusses mathematical data management and statistics concepts. It defines data management as developing architectures, policies and procedures to effectively manage enterprise information. Statistics examines ways to process and analyze collected data. The document then provides examples of how to construct frequency distributions from raw data by grouping the data into classes or categories and tallying the frequency of observations in each class. This provides meaningful insights for decision-makers.

Uploaded by

Polly Vicente
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
275 views

Frequency Distribution - Data Management PDF

The document discusses mathematical data management and statistics concepts. It defines data management as developing architectures, policies and procedures to effectively manage enterprise information. Statistics examines ways to process and analyze collected data. The document then provides examples of how to construct frequency distributions from raw data by grouping the data into classes or categories and tallying the frequency of observations in each class. This provides meaningful insights for decision-makers.

Uploaded by

Polly Vicente
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

Mathematical Data

Management

John Irish G. Lira, PhD


What is Data Management?

Data Management is the development and


execution of architectures, policies,
practices and procedures in order to
manage the information lifecycle needs of
an enterprise in an effective manner.

Data Management Association International 2019


What is Statistics?

A branch of Mathematics that examines


and investigates ways to process and
analyze the data gathered.

It provides procedure in data collection,


presentation, organization, and
interpretation to have meaningful idea that
is useful to decision-makers.
Frequency Distribution
and Graph
What is Frequency Distribution?

Tabular Form

It is a group of data into categories


showing the number of observations in
each of the non-overlapping classes.

Mutually Exclusive
Grouped Frequency Distribution

It is used when the range of the data set is


large.

The data are grouped into classes

ü Categorical

ü Interval or Ratio
Constructing Frequency Distribution

Grouped Frequency
Categorical Frequency
Grouped Frequency Distribution

Determining Class Interval


Categorical Frequency

It is used to organized nominal-level or


ordinal-level type of data.

Examples:
Gender Political affiliation
Business type Year level
Example 1

Twenty applicants were given a


performance evaluation appraisal. The
data set is

High High High Low Average


Average Low Average Average Average
Low Average Average High High
Low Low Average High High

Construct a frequency for the data.


Step 1

Construct a table.

Class Tally Frequency Percent


High
Average
Low
Step 2

Tally the raw data.

Class Tally Frequency Percent


High IIII-II
Average IIII-III
Low IIII

High High High Low Average


Average Low Average Average Average
Low Average Average High High
Low Low Average High High
Step 3

Convert the tallied data into numerical


frequencies.

Class Tally Frequency


High IIII-II 7
Average IIII-III 8
Low IIII 5
Step 4

Determine the percentage.


Class Tally Frequency Percent
High IIII-II 7 35
Average IIII-III 8 40
Low IIII 5 25

Frequency of
Percentage Formula: the class
f
% = x100%
n Total number
Percentage of values
Determining Class Interval

Rule 1: 2k ≥ n
Range HV - LV
i= =
No. of Classes k

Rule 2: Range
i=
1 + 3.322 log N

Range
Rule 3: i=
No. of Classes
Note: i = is the suggested class interval
Example 2

Suppose a researcher wished to do a study


on the monthly salary (in ₧ thousands) of call
center agents of selected call center
companies. The research first would have to
collect the data by asking each call center
agents about their monthly salary. The data
collected in original form is called raw data.
In this case, the data are
Example 2 (continuation)

18.80 22.00 23.40 24.30 27.00 27.90 31.00 26.00 20.80 17.00
20.00 22.60 23.40 24.50 27.00 29.30 32.10 26.10 21.00 17.30
20.25 22.75 23.70 24.70 27.40 30.10 33.70 26.30 21.60 17.80
18.40 21.90 23.00 23.85 26.80 27.80 30.80 25.00 20.40 15.50
18.70 21.90 23.20 24.10 26.90 27.90 30.90 25.20 20.50 15.70
17.95 21.75 22.90 23.70 26.50 27.50 30.60 24.75 20.25 14.10
18.35 21.80 22.90 23.70 26.50 27.60 30.75 25.00 20.30 14.30
20.20 22.80 23.50 24.60 27.30 29.50 32.90 26.20 21.30 17.40
Example 2 (continuation)

Construct a frequency distribution using


Rule 1 and determine the following
Range Relative frequencies
Interval Percentage
Class limits Midpoints
Class boundaries Cumulative frequencies
Step 1

Arrange the raw data in ascending or


descending order.
14.10 17.95 20.25 21.75 22.90 23.70 24.75 26.50 27.50 30.60
14.30 18.35 20.30 21.80 22.90 23.70 25.00 26.50 27.60 30.75
15.50 18.40 20.40 21.90 23.00 23.85 25.00 26.80 27.80 30.80
15.70 18.70 20.50 21.90 23.20 24.10 25.20 26.90 27.90 30.90
17.00 18.80 20.80 22.00 23.40 24.30 26.00 27.00 27.90 31.00
17.30 20.00 21.00 22.60 23.40 24.50 26.10 27.00 29.30 32.10
17.40 20.20 21.30 22.75 23.50 24.60 26.20 27.30 29.50 32.90
17.80 20.25 21.60 22.80 23.70 24.70 26.30 27.40 30.10 33.70
Step 2

Determine the classes


ü Find the Highest Value (HV) and Lowest
Value (LV) in the data set.
HV = 33.70 and LV = 14.10

ü Find the Range


Range = HV – LV = 33.70 – 14.10 = 19.60

ü Determine the number of classes using


2K ≥ n Rule
Determining the number of classes

2k ≥ n (2 raised to the power of k.)

? When k = 6
2k ≥ n Þ 26 ≥ 80 Þ 64 ≥ 80

? When k = 7
2k ≥ n Þ 27 ≥ 80 Þ 128 ≥ 80

Thus, the recommended no. of classes is 7.


Determine the class interval (or width)

Range HV - LV
i= =
No. of Classes k

33.70 - 14.10 19.60


i= = = 2.80 » 3
7 7

Thus, the interval is 3.


Step 2 (continuation)

Select a starting point for the lowest class limit

14

14.10 17.95 20.25 21.75 22.90 23.70 24.75 26.50 27.50 30.60
14.30 18.35 20.30 21.80 22.90 23.70 25.00 26.50 27.60 30.75
15.50 18.40 20.40 21.90 23.00 23.85 25.00 26.80 27.80 30.80
15.70 18.70 20.50 21.90 23.20 24.10 25.20 26.90 27.90 30.90
17.00 18.80 20.80 22.00 23.40 24.30 26.00 27.00 27.90 31.00
17.30 20.00 21.00 22.60 23.40 24.50 26.10 27.00 29.30 32.10
17.40 20.20 21.30 22.75 23.50 24.60 26.20 27.30 29.50 32.90
17.80 20.25 21.60 22.80 23.70 24.70 26.30 27.40 30.10 33.70
Step 2 (continuation)

Determine Lower and Upper class limits

Class Limits
12 – 14
15 – 17
18 – 20
Lower Limit 21 – 23 Upper Limit
24 – 26
27 – 29
30 – 32
33 – 35
Step 2 (continuation)

Determine Lower and Upper Class Limits


Class Limits Class Boundaries
12 – 14 11.5 – 14.5
15 – 17 14.5 – 17.5 12 – 0.5 = 11.5
18 – 20 17.5 – 20.5
21 – 23 20.5 – 23.5
24 – 26 23.5 – 26.5
27 – 29 26.5 – 29.5 20 + 0.5 = 20.5
30 – 32 29.5 – 32.5
33 – 35 32.5 – 35.5
Step 3

Tally the raw data

Class Limits Real Boundaries Tally


12 – 14 11.4445 – 14.4444 II
15 – 17 14. 4445 – 17.4444 IIII
17. 4445 – 20.4444
18 – 20 IIII-IIII-II
20. 4445 – 23.4444
21 – 23 23. 4445– 26.4444 IIII-IIII-IIII-IIII
24 – 26 26. 4445 – 29.4444 IIII-IIII-IIII-III
27 – 29 29. 4445 – 32.4444 IIII-IIII-IIII
30 – 32 32. 4445 – 35.4444 IIII-III
33 – 35 II
Step 4

Convert the tallied data to numerical frequencies

Class Limits Tally Frequency


12 – 14 II 2
15 – 17 IIII 5
18 – 20 IIII-IIII-II 12
21 – 23 IIII-IIII-IIII-IIII 19
24 – 26 IIII-IIII-IIII-III 18
27 – 29 IIII-IIII-IIII 14
30 – 32 IIII-III 8
33 – 35 II 2
Step 6

Determine the percentage


Class Limits Frequency Percentage
12 – 14 2 2.50
15 – 17 5 6.25
18 – 20 12 15.00
21 – 23 19 23.75
24 – 26 18 22.50
27 – 29 14 17.50
30 – 32 8 10.00
33 – 35 2 2.50
Total 80 100
(8 ÷ 80) x 100% ≈ 10%
Step 5

Determine the relative frequency (rf)


Class Limits Frequency Relative Frequency
12 – 14 2 0.0250
15 – 17 5 0.0625
18 – 20 12 0.1500
21 – 23 19 0.2375
24 – 26 18 0.2250
27 – 29 14 0.1750
30 – 32 8 0.1000
33 – 35 2 0.0250
Total 80 1.00
8 ÷ 80 ≈ 0.10
Step 7

Determine the cumulative frequencies (cf)


Class Limits f cf Found by
12 – 14 2 2 2
15 – 17 5 7 2+5
18 – 20 12 19 2 + 5 +12
21 – 23 19 38 2 + 5 +12 + 19
24 – 26 18 56 2 + 5 +12 + 19 + 18
27 – 29 14 70 2 + 5 +12 + 19 + 18 + 14
30 – 32 8 78 2 + 5 +12 + 19 + 18 + 14 + 8
33 – 35 2 80
Total 80
Step 8

Determine the midpoints (X)


Class Limits f X Found by
12 – 14 2 13 (12 + 14) ÷ 2
15 – 17 5 16 (15 + 17) ÷ 2
18 – 20 12 19 (19 + 20) ÷ 2
21 – 23 19 22 (21 + 23) ÷ 2
24 – 26 18 25 (24 + 26) ÷ 2
27 – 29 14 28 (27 + 29) ÷ 2
30 – 32 8 31 (30 + 32) ÷ 2
33 – 35 2 34
Total 80
Example 3

SJS Travel Agency, a nationwide local travel


agency, offers special rates on summer
period. The owner wants additional
information on the ages of those people
taking travel tours. A random sample of 50
customers taking travel tours last summer
revealed these ages.
Example 3 (continuation)

18 29 42 57 61 67 37 49 53 47
24 34 45 58 63 70 39 51 54 48
28 36 46 60 66 77 40 52 56 49
19 31 44 58 62 68 38 50 54 48
27 36 46 59 64 74 39 51 55 48

Construct a frequency distribution using


Rule 1.
Step 1

Arrange the raw data in ascending or


descending order.

18 29 37 42 47 49 53 57 61 67
19 31 38 44 48 50 54 58 62 68
24 34 39 45 48 51 54 58 63 70
27 36 39 46 48 51 55 59 64 74
28 36 40 46 49 52 56 60 66 77
Step 2

ü Find the Highest Value (HV) and Lowest


Value (LV) in the data set.
HV = 77 and LV = 18

ü Find the Range


Range = HV – LV = 77 – 18 = 59
Step 2 (continuation)

ü Determine the number of classes using


Range
i=
1 + 3.322 log N
77 - 18
=
1 + 3.322(log 50)
59
=
1 + 3.322(1.698970004)

59
= = 8.88 » 9
6.643978354
Step 3 (continuation)

Select a starting point for the lowest class limit

18

18 29 37 42 47 49 53 57 61 67
19 31 38 44 48 50 54 58 62 68
24 34 39 45 48 51 54 58 63 70
27 36 39 46 48 51 55 59 64 74
28 36 40 46 49 52 56 60 66 77
Step 3 (continuation)

Determine Lower and Upper class limits

Class Limits
18 – 26
27 – 35
36 – 44
Lower Limit 45 – 53 Upper Limit
54 – 62
63 – 71
72 – 80
Step 3 (continuation)

Determine Lower and Upper Class Limits


Class Limits Class Boundaries
18 – 26 17.5 – 26.5
27 – 35 26.5 – 35.5 18 – 0.5 = 17.5
36 – 44 35.5 – 44.5
45 – 53 44.5 – 53.5
54 – 62 53.5 – 62.5
63 – 71 62.5 – 71.5 44 + 0.5 = 44.5
72 – 80 71.5 – 80.5
Step 3

Tally the raw data

Class Limits Class Boundaries Tally


18 – 26 17.5 – 26.5 III
27 – 35 26.5 – 35.5 IIII
36 – 44 35.5 – 44.5 IIII-IIII
45 – 53 44.5 – 53.5 IIII-IIII-IIII
54 – 62 53.5 – 62.5 IIII-IIII-I
63 – 71 62.5 – 71.5 IIII-I
72 – 80 71.5 – 80.5 II
Step 4

Convert the tallied data to numerical frequencies

Class Limits Tally Frequency


18 – 26 III 3
27 – 35 IIII 5
36 – 44 IIII-IIII 9
45 – 53 IIII-IIII-IIII 14
54 – 62 IIII-IIII-I 11
63 – 71 IIII-I 6
72 – 80 II 2
Step 5

Determine the relative frequency (rf)


Class Limits Frequency Relative Frequency
18 – 26 3 0.06
27 – 35 5 0.10
36 – 44 9 0.18
45 – 53 14 0.28
54 – 62 11 0.22
63 – 71 6 0.12
72 – 80 2 0.04
Total 50 1.00

2 ÷ 50 = 0.04
Step 6

Determine the percentage


Class Limits Frequency Percentage
18 – 26 3 6
27 – 35 5 10
36 – 44 9 18
45 – 53 14 28
54 – 62 11 22
63 – 71 6 12
72 – 80 2 4
Total 50 100

(2 ÷ 50) x 100 = 4
Step 7

Determine the cumulative frequencies (cf)


Class Limits f cf Found by
18 – 26 3 3 3
27 – 35 5 8 3+5
36 – 44 9 17 3+5+9
45 – 53 14 31 3 + 5 + 9 + 14
54 – 62 11 42 3 + 5 + 9 + 14 + 11
63 – 71 6 48 3 + 5 + 9 + 14 + 11 + 6
72 – 80 2 50 3 + 5 + 9 + 14 + 11 + 6 + 2
Total 50
Step 8

Determine the midpoints (X)


Class Limits f X Found by
18 – 26 3 22 (18 + 26) ÷ 2
27 – 35 5 31 (27 + 35) ÷ 2
36 – 44 9 40 (36 + 44) ÷ 2
45 – 53 14 49 (45 + 53) ÷ 2
54 – 62 11 58 (54 + 62) ÷ 2
63 – 71 6 67 (63 + 71) ÷ 2
72 – 80 2 76 (72 + 80) ÷ 2
Total 50
What is a Stem-and-Leaf plot?

This method is to some extent overcomes the


loss of actual observations brought about by
the histogram.
The advantage of the stem-and-leaf plot over
the histogram is that we can see the actual
observations.
Was introduced by John Tukey.

The stem is the leading digit or digits.


The leaf is the trailing digit.
Example 3 (Stem and Leaf)

NU Travel Agency, a nationwide local travel


agency, offers special rates on summer
period. The owner wants additional
information on the ages of those people
taking travel tours. A random sample of 50
customers taking travel tours last summer
revealed these ages.
Example 3 (continuation)

18 29 42 57 61 67 37 49 53 47
24 34 45 58 63 70 39 51 54 48
28 36 46 60 66 77 40 52 56 49
19 31 44 58 62 68 38 50 54 48
27 36 46 59 64 74 39 51 55 48

Construct a stem-and-leaf plot.


Example 3 (Stem and Leaf)

Stem Leaf
1 8, 9
2 4, 7, 8, 9
3 1, 4, 6, 6, 7, 8, 9, 9
4 0, 2, 4, 5, 6, 6, 7, 8, 8, 8, 9, 9
5 0, 1, 1, 2, 3, 4, 4, 5, 6, 7, 8, 8, 9
6 0, 1, 2, 3, 4, 6, 7, 8
7 0, 4, 7

Tens digit Units digit


(leading digits) (trailing digits)
Graphing Frequency Distribution

Histogram
Frequency Polygon
Cumulative Frequency or Ogive
Example 3

Let us consider the middle income of 80


families living in National Capital Region.
Class Limits Class Boundaries X f cf
18 – 26 17.5 – 26.5 22 4 4
27 – 35 26.5 – 35.5 31 9 13
36 – 44 35.5 – 44.5 40 16 29
45 – 53 44.5 – 53.5 49 23 52
54 – 62 53.5 – 62.5 58 17 69
63 – 71 62.5 – 71.5 67 8 77
72 – 80 71.5 – 80.5 76 3 80

Construct a histogram, frequency polygon,


and cumulative frequency polygon.
Histogram

A graph in which the classes are marked on


the horizontal axis (x-axis) and the class
frequencies on the vertical axis (y-axis).
Histogram of Middle Income Families at NCR
25
20
Frequency

Midpoints
15
10
5
0
22 31 40 49 58 67 76
Salary (in Thousands)
Frequency Polygon

A graph that displays the data using points


which are connected by lines.

Frequency Polygon for Call Center Agents' Salary

25

20
Frequency

15

10

5
Midpoints
0
15 18 21 24 27 30 33
Salary (inThousands)
Cumulative Frequency Polygon

A graph that displays the cumulative


frequencies for the classes in a frequency
distribution.

Ogive for Call Center Agents' Salary


100
Cumulative Frequency

80

60
Upper
40 Class
20 Boundaries
0
16.5 19.5 22.5 25.5 28.5 31.5 34.5

Real Limit (Salary in Thousands)


Other Types of Graphs/Charts

Pareto Chart
Bar Chart (Bar Graph)
Pie Chart (Circle Graph)
Time Series Graph
Pictograph
Scatter Plot
Example 4

Using the information in the table below about


the favorite snacks of 870 youths, construct a
pareto chart, bar chart, and pie chart.

Products Sales
Junk Foods 135
Candy 250
Ice Cream 185
Chocolate 210
Others 90
Pareto Chart

It represent a frequency distribution for a


categorical data (or nominal-level) & frequencies
are displayed by the heights of vertical bars,
which are arranged in order from highest to
lowest.
Favorite Snacks
300

250
Sales (in Millions)

200

150

100

50

0
Candy Chocolate Ice Cre am Junk Foods Othe rs
Products
Bar Chart (Bar Graph)

The bases of the rectangles are arbitrary intervals


whose centers are the codes. The height of each
rectangle represents the frequency of that
category. It is also applicable for categorical data
(or nominal-level).
Favorite Snacks
300

250
Sales (in Millions)

200

150

100

50

0
Junk Foods Candy Ice Cream Chocolate Others
Products
Pie Chart (Circle Graph)

A circle divided into portions that represent the


relative frequencies (or percentages) of the data
belonging to different categories. The data in a
pie chart should be categorical or nominal-level.

Favorite Snacks
Others
10%
Candy
29%
Junk Foods
16%

Ice Cream
21% Chocolate
24%
Time Series Graph

It represents data that occur over specific


period of time under observation.

It shows for a trend or pattern on the increase


or decrease over the period of time.
Example for Time Series Graph

Using the information in the table below about


the dollar to peso exchange rate from January to
December of 2009, construct a time series
graph.

Month Jan Feb March April May June

Peso/US Dollar 41 42 43 46 44 45
Exchange Rate
Month July August Sept Oct Nov Dec
Peso/US Dollar 43 42 45 44 45 43
Exchange Rate
Example for Time Series Graph

Peso-US Dollar Exchange Rate


47
46
Peso per US Dollar

45
44
43
42
41
40
39
38

Jan Feb Mar Apr May Jun Jul Aug S ep Oct Nov Dec

Months
Pictograph

It immediately suggests the nature of the data


being shown.

It is a combination of the attention-getting


quality and the accuracy of the bar chart.

Appropriate pictures arranged in a row


(sometimes in a column) present the
quantities for comparison.
Example for Pictograph

The VSAS Realty Inc. is a real estate who


develops household in Rizal province. The
information in the table show the number of
house construction from 2005 to 2009. Construct
a pictograph.

Year 2005 2006 2007 2008 2009


No. of Houses 400 250 600 550 700
Example for Pictograph

800
700
600
No. of houses

500
400
300
200
100
2005 2006 2007 2008 2009
Year
Legend: = 100 houses
Scatter Plot

It used to examine possible relationships


between two numerical variables.

The two variables are plot in x-axis and y-axis.


Example for Scatter Diagram

The owner of a chain of halo-halo stores


would like to study the effect of atmospheric
temperature on sales during the summer
season. A random sample of 12 days is
selected with the results given as follows:

Day 1 2 3 4 5 6 7 8 9 10 11 12
Temperature (°F) 79 76 78 84 90 83 93 94 97 85 88 82
Total Sales 147 143 147 168 206 155 192 211 209 187 200 150
Guidelines for Developing Graphs/Charts

ü The graph or chart should include a title.

ü The scales for all axes should be included.

ü The scale on the y-axis should start at zero.

ü The graph or chart should not disfigure the data.

ü The x-axis and y-axis should be properly labeled.

ü The graph or chart should not contain


unnecessary decorations.

ü The simplest possible graph or chart should be


used for any data set.
Example for Scatter Plot

225
200
175
150
Sales (Y)

125
100
75
50
25
0
0 15 30 45 60 75 90
Temperature (X)
Statistics: The only Science that enables
different experts using the same figure to
draw different conclusions.
– Evan Esar

You might also like