Chapter 02
Chapter 02
By
Dr. Abdelfattah Mustafa
Associate Professor of Mathematical Statistics
February 8, 2024
1/43
Dr. Abdelfattah Mustafa STAT 3111: General Statistics February 8, 2024 1 / 43
Contents
1 Introduction
2 Organizing Data
2/43
Dr. Abdelfattah Mustafa STAT 3111: General Statistics February 8, 2024 2 / 43
Introduction
Introduction
When conducting a statistical study, the researcher must gather data for the
particular variable under study.
After organizing the data, the researcher must present them so they can be
understood.
3/43
Dr. Abdelfattah Mustafa STAT 3111: General Statistics February 8, 2024 3 / 43
Organizing Data
When the data are in original form, they are called raw data and are listed
next.
69 65 76 59 74 81 73 38 57 49
61 69 49 85 65 78 68 69 56 54
67 64 43 82 78 43 37 68 81 48
80 59 85 40 85 79 77 81 56 52
74 87 90 83 61 69 61 57 71 60
the researcher organizes the data into what is called a frequency distribution.
4/43
Dr. Abdelfattah Mustafa STAT 3111: General Statistics February 8, 2024 4 / 43
Organizing Data
Two types of frequency distributions that are most often used are
5/43
Dr. Abdelfattah Mustafa STAT 3111: General Statistics February 8, 2024 5 / 43
Organizing Data
The categorical frequency distribution is used for data that can be placed in
specific categories, such as nominal or ordinal levels data.
Twenty-five army inductees were given a blood test to determine their blood type.
The data set is
A B B AB O O O B AB
B B B O A O A O O
O AB AB A O B A
Solution:
6/43
Dr. Abdelfattah Mustafa STAT 3111: General Statistics February 8, 2024 6 / 43
Organizing Data
where,
frequency
Percent = × 100 (1)
Total
7/43
Dr. Abdelfattah Mustafa STAT 3111: General Statistics February 8, 2024 7 / 43
Organizing Data
For quantitative data, when the range of the raw data is large, the data must
be grouped into classes that are more than one unit in width, in what is called a
grouped frequency distribution.
The researcher must decide how many classes to use and the width of each class.
To construct a frequency distribution, follow these rules:
2 It is preferable but not absolutely necessary that the class width be an odd
number.
6 The classes must be equal in width. One exception occurs when a distribution
has a class that is open-ended.
9/43
Dr. Abdelfattah Mustafa STAT 3111: General Statistics February 8, 2024 9 / 43
Organizing Data
The following data represent the record high temperatures in degrees Fahrenheit (F)
for each of the 50 states.
112 100 127 120 134 118 105 110 109 112
110 118 117 116 118 122 114 114 105 109
107 112 114 115 118 117 118 122 106 110
116 108 110 121 113 120 119 111 104 111
120 113 120 117 105 110 118 112 114 114
Solution:
The procedure for constructing a grouped frequency distribution for numerical data
follows.
10/43
Dr. Abdelfattah Mustafa STAT 3111: General Statistics February 8, 2024 10 / 43
Organizing Data
Select the number of classes desired (usually between 5 and 20). In this case, 7 is
arbitrarily chosen. The number of classes can be calculated by:
11/43
Dr. Abdelfattah Mustafa STAT 3111: General Statistics February 8, 2024 11 / 43
Organizing Data
Lower upper
100 105 - one unit = 104
100 + width = 105 104 + width = 109
110 114
115 119
120 124
125 129
130 134
12/43
Dr. Abdelfattah Mustafa STAT 3111: General Statistics February 8, 2024 12 / 43
Organizing Data
The frequency distribution shows that the class 109.5–114.5 contains the largest
number of temperatures (18) followed by the class 114.5–119.5 with 13
temperatures.
Hence, most of the temperatures (31) fall between 109.5 and 119.5F.
13/43
Dr. Abdelfattah Mustafa STAT 3111: General Statistics February 8, 2024 13 / 43
Organizing Data
The values are found by adding the frequencies of the classes less than or equal
to the upper class boundary of a specific class.
The cumulative frequency distribution for the data in this example is as follows:
14/43
Dr. Abdelfattah Mustafa STAT 3111: General Statistics February 8, 2024 14 / 43
Organizing Data
Cumulative frequencies are used to show how many data values are accumulated
up to and including a specific class.
After the raw data have been organized into a frequency distribution, it will be
analyzed by looking for peaks and extreme values.
The peaks show which class or classes have the most data values compared to
the other classes.
Extreme values, called outliers, show large or small data values that are relative
to other data values.
When the range of the data values is relatively small, a frequency distribution
can be constructed using single data values for each class.
15/43
Dr. Abdelfattah Mustafa STAT 3111: General Statistics February 8, 2024 15 / 43
Organizing Data
The data shown here represent the number of miles per gallon (mpg) that 30 selected
four-wheel-drive sports utility vehicles obtained in city driving.
12 17 12 14 16 18 16 18 12 16
17 15 15 16 12 15 16 16 12 14
15 12 15 15 19 13 16 18 16 14
Solution:
16/43
Dr. Abdelfattah Mustafa STAT 3111: General Statistics February 8, 2024 16 / 43
Organizing Data
R 7
width = = =1
k 7
therefore, the classes consisting of a single data value can be used.
The classes: 12, 13, 14, 15, 16, 17, 18, 19.
In this case, almost one-half (14) of the vehicles get 15 or 16 miles per gallon..
17/43
Dr. Abdelfattah Mustafa STAT 3111: General Statistics February 8, 2024 17 / 43
Histograms, Frequency Polygons, and Ogives
1 The histogram.
18/43
Dr. Abdelfattah Mustafa STAT 3111: General Statistics February 8, 2024 18 / 43
Histograms, Frequency Polygons, and Ogives
Frequency Histogram:
The histogram is a graph that displays the data by using contiguous vertical
bars of various heights to represent the frequencies of the classes.
Construct a histogram to represent the data shown for the record high temperatures
for each of the 50 states.
Class boundaries Frequency
99.5 – 104.5 2
104.5 – 109.5 8
109.5 – 114.5 18
114.5 – 119.5 13
119.5 – 124.5 7
124.5 – 129.5 1
129.5 – 134.5 1
19/43
Dr. Abdelfattah Mustafa STAT 3111: General Statistics February 8, 2024 19 / 43
Histograms, Frequency Polygons, and Ogives
Solution:
1 Draw and label the x and y axes.
2 Represent the frequency on the y axis and the class boundaries on the x axis.
3 Using the frequencies as the heights, draw vertical bars for each class.
20/43
Dr. Abdelfattah Mustafa STAT 3111: General Statistics February 8, 2024 20 / 43
Histograms, Frequency Polygons, and Ogives
Frequency Polygon:
The frequency polygon is a graph that displays the data by using lines that
connect points plotted for the frequencies at the midpoints of the classes. The
frequencies are represented by the heights of the points.
Solution:
Find the midpoints of each class.
21/43
Dr. Abdelfattah Mustafa STAT 3111: General Statistics February 8, 2024 21 / 43
Histograms, Frequency Polygons, and Ogives
22/43
Dr. Abdelfattah Mustafa STAT 3111: General Statistics February 8, 2024 22 / 43
Histograms, Frequency Polygons, and Ogives
The ogive:
The ogive is a graph that represents the cumulative frequencies for the classes in
a frequency distribution.
Solution:
23/43
Dr. Abdelfattah Mustafa STAT 3111: General Statistics February 8, 2024 23 / 43
Histograms, Frequency Polygons, and Ogives
2 Plot the cumulative frequency at each upper class boundary, as shown in the
following figure.
3 Connect adjacent points with line segments. Then extend the graph to the first
lower class boundary, 99.5, on the x axis.
24/43
Dr. Abdelfattah Mustafa STAT 3111: General Statistics February 8, 2024 24 / 43
Histograms, Frequency Polygons, and Ogives
Cumulative frequency graphs are used to visually represent how many values are
below a certain upper class boundary.
For example, to find out how many record high temperatures are less than
114.5F, from the Figure 3, is 28.
25/43
Dr. Abdelfattah Mustafa STAT 3111: General Statistics February 8, 2024 25 / 43
Histograms, Frequency Polygons, and Ogives
The histogram, the frequency polygon, and the ogive shown previously were
constructed by using frequencies in terms of the raw data.
26/43
Dr. Abdelfattah Mustafa STAT 3111: General Statistics February 8, 2024 26 / 43
Histograms, Frequency Polygons, and Ogives
Solution:
27/43
Dr. Abdelfattah Mustafa STAT 3111: General Statistics February 8, 2024 27 / 43
Histograms, Frequency Polygons, and Ogives
28/43
Dr. Abdelfattah Mustafa STAT 3111: General Statistics February 8, 2024 28 / 43
Histograms, Frequency Polygons, and Ogives
Distribution Shapes:
29/43
Dr. Abdelfattah Mustafa STAT 3111: General Statistics February 8, 2024 29 / 43
Histograms, Frequency Polygons, and Ogives
Distributions can have other shapes in addition to the ones shown here.
30/43
Dr. Abdelfattah Mustafa STAT 3111: General Statistics February 8, 2024 30 / 43
Other Types of Graphs
In addition to the histogram, the frequency polygon, and the ogive, several other
types of graphs are often used in statistics. They are the bar graph, Pareto chart,
time series graph, and pie graph.
Bar Graphs: When the data are qualitative or categorical, bar graphs can be
used to represent the data.
A bar graph represents the data by using vertical or horizontal bars whose
heights or lengths represent the frequencies of the data.
The table shows the average money spent by first-year college students. Draw a
horizontal and vertical bar graph for the data.
31/43
Dr. Abdelfattah Mustafa STAT 3111: General Statistics February 8, 2024 31 / 43
Other Types of Graphs
Electronics $728
Dorm decor 344
Clothing 141
Shoes 72
Solution:
Draw the bars corresponding to the frequencies.
Figure 4: The bar charts for College Spending for First-Year Students.
32/43
Dr. Abdelfattah Mustafa STAT 3111: General Statistics February 8, 2024 32 / 43
Other Types of Graphs
The Time Series Graph: When data are collected over a period of time, they
can be represented by a time series graph.
A time series graph represents data that occur over a specific period of time.
The number of homicides that occurred in the workplace for the years 2003 to 2008 is
shown. Draw and analyze a time series graph for the data.
Solution:
34/43
Dr. Abdelfattah Mustafa STAT 3111: General Statistics February 8, 2024 34 / 43
Other Types of Graphs
A Pie Graph: Pie graphs are used extensively in statistics. The purpose of the
pie graph is to show the relationship of the parts to the whole by visually
comparing the sizes of the sections. Percentages or proportions can be used. The
variable is nominal or categorical.
A pie graphs is a circle that is divided into sections or wedges according to the
percentage of frequencies in each category of the distribution.
35/43
Dr. Abdelfattah Mustafa STAT 3111: General Statistics February 8, 2024 35 / 43
Other Types of Graphs
Solution
Since there are 360◦ in a circle, the frequency for each class must be converted
into a proportional part of the circle. This conversion is done by using the
following formulas
f f
%= × 100 Degrees = × 360.
n n
Therefore,
36/43
Dr. Abdelfattah Mustafa STAT 3111: General Statistics February 8, 2024 36 / 43
Other Types of Graphs
Next, using a protractor and a compass, draw the graph using the appropriate
degree measures, and label each section with the name and percentages, as
shown in the following Figure
37/43
Dr. Abdelfattah Mustafa STAT 3111: General Statistics February 8, 2024 37 / 43
Other Types of Graphs
Construct a pie graph showing the blood types of the army inductees described in
Example 1. The frequency distribution is repeated here.
Solution:
The number of degrees can be calculated as in the following table
Class Frequency Percent Degree
A 5 20 72
B 7 28 100.8
O 9 36 129.6
AB 4 16 57.6
Total 25 100 360
38/43
Dr. Abdelfattah Mustafa STAT 3111: General Statistics February 8, 2024 38 / 43
Other Types of Graphs
39/43
Dr. Abdelfattah Mustafa STAT 3111: General Statistics February 8, 2024 39 / 43
Other Types of Graphs
A stem and leaf plot is a data plot that uses part of the data value as the
stem and part of the data value as the leaf to form groups or classes.
At an outpatient testing center, the number of cardiograms performed each day for
20 days is shown. Construct a stem and leaf plot for the data.
25 31 20 32 13 14 43 02 57 23
36 32 33 32 44 32 52 44 51 45
Solution:
3 A display can be made by using the leading digit as the stem and the trailing
digit as the leaf
41/43
Dr. Abdelfattah Mustafa STAT 3111: General Statistics February 8, 2024 41 / 43
Other Types of Graphs
52 62 51 50 69 58 77 66 53 57
75 56 55 67 73 79 59 68 65 72
57 51 63 69 75 65 53 78 66 55
Solution:
42/43
Dr. Abdelfattah Mustafa STAT 3111: General Statistics February 8, 2024 42 / 43
Other Types of Graphs
When the data values are in the hundreds, such as 325, the stem is 32 and the
leaf is 5. For example, the stem and leaf plot for the data values 325, 327, 330,
332, 335, 341, 345, and 347 looks like this.
32 5 7
33 0 2 5
34 1 5 7
Exercises 2
Page 95: 1 - 6. Page 98: 17, 18, 20, 21
43/43
Dr. Abdelfattah Mustafa STAT 3111: General Statistics February 8, 2024 43 / 43