UCCM2233 - Chp2 Organizing Data - Wble
UCCM2233 - Chp2 Organizing Data - Wble
Organizing Data
Learning Outcome
Organize data in an intelligible manner from a
probabilistic and statistical point of view.
Nominal
- Refers to Characteristic
Qualitative
e.g. gender, race, yes/no
-Categorical
-Not measurable Ordinal
-Can be related in order
e.g. preference, importance
2.1 Some Definitions
Raw Data
Data in the sequence in which they are collected
and before they are processed or ranked.
Examples:
Table 1: The weights of 20 students in kg (Quantitative
raw data)
61 68 65 67 68 71 69 63 74 64
66 65 62 67 60 73 69 70 70 71
2.1 Some Definitions (cont'd)
Raw Data (cont'd)
Table 2: The grades of UCCM 1213 of 20 students
(Qualitative raw data)
A B C A C B B A B C
B A B B B A C D D B
2.1 Some Definitions (cont'd)
Arrays
An arrangement of numerical raw data in
ascending order or descending order of
magnitude.
Example:
60 61 62 63 64 65 65 66 67 67
68 68 69 69 70 70 71 71 73 74
2.1 Some Definitions (cont'd)
Ungrouped Data
Contains information on each member of a
sample or population individually.
Examples:
Table 1 and Table 2.
Grouped Data
Data presented in classes or intervals.
Example:
UCCM1213 10 – 12 13 – 15 16 – 18 19 – 21
Scores
Number of 4 12 20 14
students
2.2 Organizing and Graphing
Qualitative Data
Qualitative Data
Frequency Distribution for Qualitative Data
(cont'd)
Solution:
Qualitative Data
Frequency Distribution for Qualitative Data
(Excel)
Qualitative Data
Frequency Distribution for Qualitative Data
(Excel)
Qualitative Data
Frequency Distribution for Qualitative Data
(Excel)
Qualitative Data
Frequency Distribution for Qualitative Data
(Excel)
Qualitative Data
Frequency Distribution for Qualitative Data
(Excel)
1st drag
2nd drag
Qualitative Data
Frequency Distribution for Qualitative Data
(Excel)
Qualitative Data
Relative Frequency and Percentage
Distributions
Tabular arrangement that lists the relative
frequencies and percentages for all
categories
frequency of the category
Relative frequency of a category
sum of all frequencies
f
f
Qualitative Data
Graphical Presentation of Qualitative Data
Bar Chart (bar graph)
A graph made of bars whose heights represent the
frequencies of respective categories .
Example 2.3
Construct a bar chart for the data in
Example 2.1.
Qualitative Data
Example 2.3
Qualitative Data
Example 2.4
Qualitative Data
Graphical Presentation of Qualitative Data
Pie Chart
A circle divided into portions that represent the
relative frequencies or percentages of a
population or a sample belonging to different
categories.
Starts at 12 o’clock
Moves clock wise from largest sector to the
smallest.
Qualitative Data
Example 2.4
Qualitative Data
Using R
Qualitative Data
Qualitative Data
Bar chart (R)
Qualitative Data
Bar chart (R)
Majors
8
6
No. of students
4
2
0
Qualitative Data
Pie chart (R)
Qualitative Data
Pie chart (R)
majors
Biotech
Business
others
Engineering
Infotech
Qualitative Data
2.3 Organizing and Graphing
Quantitative Data
Small data set
Dotplot
Stem-and-leaf displays
Single-Valued Classes
Grouped data
Frequency Distribution for quantitative data
Relative frequency and percentage distributions
Histogram
Polygon
Cumulative frequency distribution
Ogive / Cumulative frequency curve
Quantitative Data
Small data set
• Dotplot display
• Stem and leaf display
Dotplot display
60 70 80 90
exam grades
Stem and leaf display
Each value is divided into two portions:
a stem & a leaf. The leaves for each stem are
shown separately in a display.
Note:
It is constructed only for quantitative data.
An advantage over a frequency distribution
because we do not lose information on individual
observations.
Female Male 11 | 179
12 | 57789
26 | 3 | 179 13 | 367
1145 | 4 | 57789
06889 | 5 | 367 Key 11 | 1 = 11.1
026 | 6 | 7
Stem and leaf display (cont’d)
1) Arrange the data in order
2) Separate the data according to the
classes (first digit)
3) Construct a plot that the leading digit as
stem and the trailing digit as leaf.
Example 2.6
• Minimum score of 50 to
a maximum score of 98
Key 5|0 =50 on the statistics test.
Example 2.7
The following data are the monthly rents paid by a
sample of 30 households selected from a city.
429 585 732 675 550 989 1020
620 750 660 540 578 956 1030 1070
930 871 765 880 975 650 1020 950
840 780 870 900 800 750 820
Construct a stem-and-leaf display for these data.
Solution (Example 2.7)
Single-Valued Classes
Single-valued classes is used if the
observations in a data set assume only a few
distinct values (classes that are made of
single values and not of intervals).
It is useful in cases of discrete data with only
a few possible values.
Single-Valued Classes (cont’d)
Example 2.8
A sample of 40 randomly selected households from a
5 1 1 2 0 1 1 2 1 1
1 3 3 0 2 5 1 2 3 4
2 1 2 2 1 2 2 1 1 1
4 2 1 1 2 1 1 4 1 3
Vehicles owned Number of households
(f)
0 2
1 18
2 11
3 4
4 3
5 2
Example 2.8 (Excel)
Example 2.8 (Excel)
Example 2.8 (Excel)
Example 2.8 (Excel)
Frequency Distribution for
quantitative data
Class
An interval that includes all the values that falls
Class Boundary
The dividing line between two classes. It is given
Grouped Data
Frequency Distribution for
quantitative data (cont'd)
Class width / class size
Class width is the difference between the upper
and lower class boundary.
Class width Upper boundary - Lower boundary
Grouped Data
Constructing frequency distribution tables (cont'd)
Grouped Data
Example 2.9
Anaheim 51 Milwaukee 43
Arizona 70 Minnesota 16
Atlanta 79 Montreal 15
Baltimore 75 New York Mets 72
Boston 72 New York Yankees 92
Chicago Clubs 55 Oakland 25
Chicago White Sox 25 Philadelphia 30
Cincinnati 38 Pittsburgh 24
Cleveland 74 St. Louis 46
Colorado 54 San Diego 47
Detroit 37 San Francisco 46
Florida 15 Seattle 45
Houston 56 Tampa Bay 38
Kansas City 17 Texas 81
Los Angeles 77 Toronto 49
Solution (Example 2.9)
24=16, 25=32, so k = 5.
Min=15, Max=92.
92 15
Approximate class width 15.4,
5
So, class width is 15 units.
Solution (Example 2.9)
Total Payroll Tally f
(million
dollars)
15-29
30-44
45-59
60-74
75-89
90-104
Relative frequency and
Percentage distributions
Relative Frequency of a class
Frequency of that class f
sum of all frequencies f
Percentage histogram
Histogram
Procedures to draw a histogram:
Mark the class boundary of each interval on the
horizontal axis.
For each class, mark the frequencies (or relative
frequencies or percentages) on the vertical axis.
Draw a bar for each class so that its height
represents the frequency of that class. (No gap
between each bars)
Label the histogram.
Histogram (cont'd)
A frequency histogram consists of a set of rectangle
having
The bases on a horizontal axis with centres at the class
marks and lengths equal to the class interval sizes.
The areas proportional to the class frequencies.
If the class intervals all have equal size
the height of the rectangles are proportional to the class
frequencies.
otherwise
the height of the rectangles must be adjusted:
Standard Class Width
Adjusted Frequency Frequency
Class Width
Histogram and Polygon (cont'd)
Example 2.10
The frequency distribution gives the weight of
35 objects, measured to the nearest kg. Draw
a histogram to illustrate the data.
6–8 3 4 4
9 – 11 3 6 6
12 – 17 6 10
18 – 20 3 3
21 – 29 9 12
Adjusted Frequency
0 Weight (kg)
5·5 8.5 11.5 16.5 20.5 29.5
Shape of Histogram
Symmetric Symmetric
30 12
25 10
20 8
15 6
10 4
5 2
0 0
Shape of Histogram
12 12
10 10
8 8
6 6
4 4
2 2
0 0
Shape of Histogram
Left Skewed Right Skewed
20 20
18 18
16 16
14 14
12 12
10 10
8 8
6 6
4 4
2 2
0 0
Shape of Histogram
Unimodal Bimodal
18 10
9
16
8
14
7
12
6
10
5
8
4
6
3
4 2
2 1
0 0
Polygon
Polygon is a line graph formed by joining the
midpoints of the tops of successive bars in a
histogram.
Next, we mark two more classes (with zero
frequencies), one at each end, and mark the
midpoints.
Three types of polygon:
Frequency polygon
Relative frequency polygon
Percentage polygon
Example 2.11
Weight (kg) Class No of
mark students
60 – 62 61 3
63 – 65 64 4
66 – 68 67 5
69 – 71 70 6
72 – 74 73 2
Histogram Polygon
7 7
6 6
5 5
No. of student
No. of student
4 4
3 3
2 2
1 1
0 0
58 61 64 67 70 73 76 58 61 64 67 70 73 76
Weight (kg) Weight (kg)
Example 2.12
Weight (kg) Class mark Frequency Relative frequency Percentage
60 – 62 61 3 0.15 15
63 – 65 64 4 0.2 20
66 – 68 67 5 0.25 25
69 – 71 70 6 0.3 30
72 – 74 73 2 0.1 10
f = 20 1.00 100 %
Relative Frequency
Relative Frequency Histogram Relative Frequency Polygon
0.35 0.35
0.3 0.3
0.25 0.25
Relative frequency
Relative frequency
0.2 0.2
0.15 0.15
0.1
0.1
0.05
0.05
0
0
58 61 64 67 70 73 76
58 61 64 67 70 73 76
Weight (kg)
Weight (kg)
Percentage
Percentage Histogram Percentage Polygon
35 35
30 30
25 25
Relative frequency
Relative frequency
20 20
15 15
10 10
5 5
0 0
58 61 64 67 70 73 76 58 61 64 67 70 73 76
25
Cum ulative Frequency
20
15
10
0
59.5 62.5 65.5 68.5 71.5 74.5
Weight (kg)
Cumulative frequency curve /
Ogive (cont'd)
Solution:
a)
b)
The End
Chapter 2