0% found this document useful (0 votes)
53 views

UCCM2233 - Chp2 Organizing Data - Wble

This chapter discusses organizing and graphing both qualitative and quantitative data. For qualitative data, it describes how to create frequency distributions, relative frequency and percentage distributions, and how to present the data graphically using bar charts and pie charts. For quantitative data, it discusses organizing small data sets using dotplots and stem-and-leaf displays, creating frequency distributions for single-valued and grouped data, and using histograms, polygons, and cumulative frequency distributions to graph the data. Statistical software like Excel and R can be used to analyze and visualize both types of data.

Uploaded by

VS Shirley
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views

UCCM2233 - Chp2 Organizing Data - Wble

This chapter discusses organizing and graphing both qualitative and quantitative data. For qualitative data, it describes how to create frequency distributions, relative frequency and percentage distributions, and how to present the data graphically using bar charts and pie charts. For quantitative data, it discusses organizing small data sets using dotplots and stem-and-leaf displays, creating frequency distributions for single-valued and grouped data, and using histograms, polygons, and cumulative frequency distributions to graph the data. Statistical software like Excel and R can be used to analyze and visualize both types of data.

Uploaded by

VS Shirley
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 84

Chapter 2

Organizing Data
Learning Outcome
 Organize data in an intelligible manner from a
probabilistic and statistical point of view.

 Use statistical software to do simple


exploratory data analysis.
Contents
2.1 Some Definitions.
2.2 Organizing and Graphing Qualitative Data.
2.2.1 Frequency distributions for qualitative data
2.2.2 Relative frequency and percentage distributions
2.2.3 Graphical presentation of qualitative data
2.3 Organizing and Graphing Quantitative Data
2.3.1 Small data set
 Dotplot displays
 Stem-and-leaf displays
2.3.2 Single-Valued Classes
2.3.3 Grouped data
 Frequency Distribution for quantitative data
 Relative frequency and percentage distributions
 Histogram
 Polygon
2.3.4 Cumulative frequency distribution
 Ogive / Cumulative frequency curve
Discrete
Quantitative -Data can only take finite number
-Take certain values with no intermediate values
- Numerical e.g. number of trials before success
- Measurable
Continuous
-Data can take any value within range
Variable e.g. height 167.5cm

Nominal
- Refers to Characteristic
Qualitative
e.g. gender, race, yes/no
-Categorical
-Not measurable Ordinal
-Can be related in order
e.g. preference, importance
2.1 Some Definitions
 Raw Data
 Data in the sequence in which they are collected
and before they are processed or ranked.
 Examples:
Table 1: The weights of 20 students in kg (Quantitative
raw data)

61 68 65 67 68 71 69 63 74 64
66 65 62 67 60 73 69 70 70 71
2.1 Some Definitions (cont'd)
 Raw Data (cont'd)
 Table 2: The grades of UCCM 1213 of 20 students
(Qualitative raw data)

A B C A C B B A B C

B A B B B A C D D B
2.1 Some Definitions (cont'd)
 Arrays
 An arrangement of numerical raw data in
ascending order or descending order of
magnitude.
 Example:

60 61 62 63 64 65 65 66 67 67
68 68 69 69 70 70 71 71 73 74
2.1 Some Definitions (cont'd)
 Ungrouped Data
 Contains information on each member of a
sample or population individually.
 Examples:
Table 1 and Table 2.
 Grouped Data
 Data presented in classes or intervals.
 Example:
UCCM1213 10 – 12 13 – 15 16 – 18 19 – 21
Scores
Number of 4 12 20 14
students
2.2 Organizing and Graphing
Qualitative Data

2.2.1 Frequency Distribution


2.2.2 Relative Frequency and
Percentage
Distributions
2.2.3 Graphical Presentation
Frequency Distribution for
Qualitative Data
 A tabular arrangement that lists all categories and
the number of elements that belong to each of the
categories.
 Example 2.1

A sample was taken of 25 students who were planning to go to college.


The courses he/she intended to choose:
Engineering Infotech Engineering Business Business
Business Business Other Biotech Biotech
Biotech Biotech Infotech Biotech Biotech
Other Business Engineering Business Other
Engineering Biotech Biotech Other Infotech
Construct a frequency distribution table for these data.

Qualitative Data
Frequency Distribution for Qualitative Data
(cont'd)
 Solution:

Course Tally Frequency


Biotech \\\\ \\\
Business
Engineering
Infotech
Others
Total:
Qualitative Data
Frequency Distribution for
Qualitative Data
(Excel)

Qualitative Data
Frequency Distribution for Qualitative Data
(Excel)

Qualitative Data
Frequency Distribution for Qualitative Data
(Excel)

Qualitative Data
Frequency Distribution for Qualitative Data
(Excel)

Qualitative Data
Frequency Distribution for Qualitative Data
(Excel)

Qualitative Data
Frequency Distribution for Qualitative Data
(Excel)

1st drag

2nd drag

Qualitative Data
Frequency Distribution for Qualitative Data
(Excel)

Qualitative Data
Relative Frequency and Percentage
Distributions
 Tabular arrangement that lists the relative
frequencies and percentages for all
categories
frequency of the category
Relative frequency of a category 
sum of all frequencies
f

f

Percentage  Relative Frequency  100%


Qualitative Data
Relative Frequency and Percentage
Distributions (cont’d)
 Example 2.2
 Determine the relative frequency and percentage distributions
for the data in Example 2.1.
 Solution:

Qualitative Data
Graphical Presentation of Qualitative Data
 Bar Chart (bar graph)
 A graph made of bars whose heights represent the
frequencies of respective categories .
 Example 2.3
 Construct a bar chart for the data in
Example 2.1.

Qualitative Data
Example 2.3

Qualitative Data
Example 2.4

Qualitative Data
Graphical Presentation of Qualitative Data

 Pie Chart
 A circle divided into portions that represent the
relative frequencies or percentages of a
population or a sample belonging to different
categories.
 Starts at 12 o’clock
 Moves clock wise from largest sector to the
smallest.

Qualitative Data
Example 2.4

Qualitative Data
Using R

 Install R from www.r-project.org


 R is a computer language for statistical
computing and it is an open-source software.

Qualitative Data
Qualitative Data
Bar chart (R)

Qualitative Data
Bar chart (R)
Majors

8
6
No. of students

4
2
0

Biotech Business Engineering Infotech others

Qualitative Data
Pie chart (R)

Qualitative Data
Pie chart (R)
majors

Biotech

Business

others

Engineering

Infotech

Qualitative Data
2.3 Organizing and Graphing
Quantitative Data
 Small data set
 Dotplot
 Stem-and-leaf displays
 Single-Valued Classes
 Grouped data
 Frequency Distribution for quantitative data
 Relative frequency and percentage distributions
 Histogram
 Polygon
 Cumulative frequency distribution
 Ogive / Cumulative frequency curve

Quantitative Data
Small data set

• Dotplot display
• Stem and leaf display
Dotplot display

 Displays the data of a sample by


representing each piece of data with a dot
positioned along a scale (horizontal scale or
vertical scale).
 The frequency of the values is represented
along the other scale.
Example 2.5
Dotplot display
 A sample of 19 exam grades was randomly

selected from a large class:


767482966676787252688684627678
92827488
Construct a dotplot of these data.
Dotplot display
Solution (Example 2.5)

60 70 80 90

exam grades
Stem and leaf display
 Each value is divided into two portions:
a stem & a leaf. The leaves for each stem are
shown separately in a display.
 Note:
 It is constructed only for quantitative data.
 An advantage over a frequency distribution
because we do not lose information on individual
observations.
Female Male 11 | 179
12 | 57789
26 | 3 | 179 13 | 367
1145 | 4 | 57789
06889 | 5 | 367 Key 11 | 1 = 11.1
026 | 6 | 7
Stem and leaf display (cont’d)
1) Arrange the data in order
2) Separate the data according to the
classes (first digit)
3) Construct a plot that the leading digit as
stem and the trailing digit as leaf.
Example 2.6

 The following are the scores of 30 college


students on a statistics test.
75 52 80 96 65 79 71 87 93 95
69 72 81 61 76 86 79 68 50 92
83 84 77 64 71 87 72 92 57 98

 Construct a stem and leaf display for these


data.
Solution (Example 2.6)
 Put the data in order:
50 52 57 61 64 65 68 69 71 71 72 72 75 76 77 79 79
80 81 83 84 86 87 87 92 92 93 95 96 98

• The distribution peaks in


the center and that there
are no gaps in the data.

• For 9 of the 30 students,


the scores were between
71 and 79.

• Minimum score of 50 to
a maximum score of 98
Key 5|0 =50 on the statistics test.
Example 2.7
 The following data are the monthly rents paid by a
sample of 30 households selected from a city.
429 585 732 675 550 989 1020
620 750 660 540 578 956 1030 1070
930 871 765 880 975 650 1020 950
840 780 870 900 800 750 820
 Construct a stem-and-leaf display for these data.
Solution (Example 2.7)
Single-Valued Classes
 Single-valued classes is used if the
observations in a data set assume only a few
distinct values (classes that are made of
single values and not of intervals).
 It is useful in cases of discrete data with only
a few possible values.
Single-Valued Classes (cont’d)
 Example 2.8
 A sample of 40 randomly selected households from a

city produced the following data on the number of


vehicles owned:
 Construct a frequency distribution table for these data.

5 1 1 2 0 1 1 2 1 1
1 3 3 0 2 5 1 2 3 4
2 1 2 2 1 2 2 1 1 1
4 2 1 1 2 1 1 4 1 3
Vehicles owned Number of households
(f)
0 2
1 18
2 11
3 4
4 3
5 2
Example 2.8 (Excel)
Example 2.8 (Excel)
Example 2.8 (Excel)
Example 2.8 (Excel)
Frequency Distribution for
quantitative data
 Class
 An interval that includes all the values that falls

within two numbers, the lower and upper limits.


 Class limits
 Endpoints of each interval.

 Class Boundary
 The dividing line between two classes. It is given

by the midpoint of the upper limit of one class and


the lower limit of the next higher class.

Grouped Data
Frequency Distribution for
quantitative data (cont'd)
 Class width / class size
 Class width is the difference between the upper
and lower class boundary.
Class width  Upper boundary - Lower boundary

 Class mark / class midpoint


 Class mark is the midpoint of the class interval.

(Lower class limit  Upper class limit)


Class mark 
2
Grouped Data
HIV Positive Cases in Malaysia by age groups
Age No of HIV Class Boundaries Class Class
group +ve Cases Width Midpoint

2-12 532 1.5 to less than 12.5 11 7


13-19 1140
20-29 27995
30-39 34770
40-49 12580

Fourth class = Lower boundary of 3rd class =

Lower limit of 3rd class = Upper boundary of 3rdclass =

Upper limit of 3rd class = 19.5-12.5 = (13 + 19) / 2 =


Constructing frequency distribution tables
 Determine the number of classes,
usually varies from 5 to 20, depending mainly on the number
of observations in the data set.
 Find 2k where k is the smallest number such that 2k is

greater than the number of observations (n).


 Determine the class interval or width ( i ) Must cover at
least the distance from the smallest value (L) in the raw data
up to the largest value (H).

Largest value(H) -Smallest value(L)


Approximate class width 
number of classes

Grouped Data
Constructing frequency distribution tables (cont'd)

 Determine the lower limit of the first class or


the starting point.
 Any convenient number that is equal to or less
than the smallest value in the data set can be
used as the lower limit of the first class.

Grouped Data
Example 2.9

 The data gives the 1999 total payrolls


(rounded to millions) for all 30 major league
baseball teams.
Construct a frequency distribution table.
Example 2.9 (Cont’d)
Team Total Payroll Team Total Payroll
(million dollars) (million dollars)

Anaheim 51 Milwaukee 43
Arizona 70 Minnesota 16
Atlanta 79 Montreal 15
Baltimore 75 New York Mets 72
Boston 72 New York Yankees 92
Chicago Clubs 55 Oakland 25
Chicago White Sox 25 Philadelphia 30
Cincinnati 38 Pittsburgh 24
Cleveland 74 St. Louis 46
Colorado 54 San Diego 47
Detroit 37 San Francisco 46
Florida 15 Seattle 45
Houston 56 Tampa Bay 38
Kansas City 17 Texas 81
Los Angeles 77 Toronto 49
Solution (Example 2.9)

 24=16, 25=32, so k = 5.
 Min=15, Max=92.
92  15
Approximate class width   15.4,
5
So, class width is 15 units.
Solution (Example 2.9)
Total Payroll Tally f
(million
dollars)
15-29
30-44
45-59
60-74
75-89
90-104
Relative frequency and
Percentage distributions
 Relative Frequency of a class
Frequency of that class f
 
sum of all frequencies f

Percentage  Relative Frequency 100%


 Example 2.10
 Calculate the relative frequencies and
percentages distributions for the data in Example
2.9.
Solution (Example 2.9)
Total Payroll Tally f Class Relative Percentage
(million Boundaries Frequency
dollars)
15-29
30-44
45-59
60-74
75-89
90-104
Histogram and Polygon
 Grouped (quantitative) data can be displayed
in a histogram or a polygon.
 Histogram
Three types of histogram
 Frequency histogram

 Relative frequency histogram

 Percentage histogram
Histogram
 Procedures to draw a histogram:
 Mark the class boundary of each interval on the
horizontal axis.
 For each class, mark the frequencies (or relative
frequencies or percentages) on the vertical axis.
 Draw a bar for each class so that its height
represents the frequency of that class. (No gap
between each bars)
 Label the histogram.
Histogram (cont'd)
 A frequency histogram consists of a set of rectangle
having
 The bases on a horizontal axis with centres at the class
marks and lengths equal to the class interval sizes.
 The areas proportional to the class frequencies.
 If the class intervals all have equal size
 the height of the rectangles are proportional to the class
frequencies.
 otherwise
 the height of the rectangles must be adjusted:
Standard Class Width
Adjusted Frequency   Frequency
Class Width
Histogram and Polygon (cont'd)
 Example 2.10
 The frequency distribution gives the weight of
35 objects, measured to the nearest kg. Draw
a histogram to illustrate the data.

Weight (kg) 6–8 9 – 11 12 – 17 18 – 20 21 – 29


Frequency 4 6 10 3 12
Histogram and Polygon (cont'd)
 Solution:
Standard Class Width
Adjusted Frequency   Frequency
Class Width
Weight Class Frequency Height of rectangle
(kg) width (adjusted frequency)

6–8 3 4 4
9 – 11 3 6 6
12 – 17 6 10
18 – 20 3 3
21 – 29 9 12
Adjusted Frequency

0 Weight (kg)
5·5 8.5 11.5 16.5 20.5 29.5
Shape of Histogram
Symmetric Symmetric

30 12

25 10

20 8

15 6

10 4

5 2

0 0
Shape of Histogram

Bell Shape Uniform

12 12

10 10

8 8

6 6

4 4

2 2

0 0
Shape of Histogram
Left Skewed Right Skewed

20 20

18 18

16 16

14 14

12 12

10 10

8 8

6 6

4 4

2 2

0 0
Shape of Histogram

Unimodal Bimodal

18 10

9
16

8
14

7
12
6
10
5

8
4

6
3

4 2

2 1

0 0
Polygon
 Polygon is a line graph formed by joining the
midpoints of the tops of successive bars in a
histogram.
Next, we mark two more classes (with zero
frequencies), one at each end, and mark the
midpoints.
 Three types of polygon:
 Frequency polygon
 Relative frequency polygon
 Percentage polygon
Example 2.11
Weight (kg) Class No of
mark students
60 – 62 61 3

63 – 65 64 4

66 – 68 67 5

69 – 71 70 6

72 – 74 73 2
Histogram Polygon

7 7

6 6

5 5
No. of student

No. of student
4 4

3 3

2 2

1 1

0 0
58 61 64 67 70 73 76 58 61 64 67 70 73 76
Weight (kg) Weight (kg)
Example 2.12
Weight (kg) Class mark Frequency Relative frequency Percentage

60 – 62 61 3 0.15 15
63 – 65 64 4 0.2 20
66 – 68 67 5 0.25 25
69 – 71 70 6 0.3 30
72 – 74 73 2 0.1 10
f = 20 1.00 100 %
Relative Frequency
Relative Frequency Histogram Relative Frequency Polygon

0.35 0.35

0.3 0.3

0.25 0.25
Relative frequency

Relative frequency
0.2 0.2

0.15 0.15

0.1
0.1

0.05
0.05

0
0
58 61 64 67 70 73 76
58 61 64 67 70 73 76
Weight (kg)
Weight (kg)
Percentage
Percentage Histogram Percentage Polygon

35 35

30 30

25 25
Relative frequency

Relative frequency
20 20

15 15

10 10

5 5

0 0
58 61 64 67 70 73 76 58 61 64 67 70 73 76

Weight (kg) Weight (kg)


Cumulative frequency distribution
 A table that presents the total number of
values that fall below the upper boundary of
each class.
 It is constructed for quantitative data only.
Cumulative Relative Frequency
Cumulative Frequency of that class

sum of all frequencies in the data set

Cumulative Percentage  Cumulative Relative Frequency  100%


Example 2.13

Weight f Weight (kg) Cumulative frequency


(kg) < 59.5
60 – 62 3
< 62.5
63 – 65 4
< 65.5
66 – 68 5
< 68.5
69 – 71 6
< 71.5
72 – 74 2
< 74.5
Example 2.14
Weight (kg) Cumulative relative Cumulative percentage
frequency
Ogive /
Cumulative frequency curve
 A curve drawn for the cumulative frequency
distribution by joining the dots marked above the
upper boundaries of classes at heights equal to the
cumulative frequencies of respective classes.
 Note:
 1. The ogive starts at the lower boundary of the first

class and ends at the upper boundary of the last class.


 2. If relative cumulative frequency is used in place of

cumulative frequency, the graph is called relative


cumulative frequency curve or percentage ogive.
Example 2.15

 Draw an ogive for the data in Example 2.13.


Estimate from the ogive,
 a) the total number of students that their
weight were less than 68.3kg.
 b) the value of X ,if 20 % of the total number
of students that their weight were X kg or
more.
Cumulative frequency curve

25
Cum ulative Frequency

20

15

10

0
59.5 62.5 65.5 68.5 71.5 74.5
Weight (kg)
Cumulative frequency curve /
Ogive (cont'd)
 Solution:
a)

b)
The End
Chapter 2

You might also like