Lesson 2
Lesson 2
Introduction to Statistics
STAT 1101
❑ When conducting a statistical study, the researcher must gather data for the particular variable
under study.
❑ For example, if a researcher wishes to study the number of people who were bitten by
poisonous snakes in a specific geographic area over the past several years, he or she has to
gather the data from various doctors, hospitals, or health departments.
❑ To describe situations, draw conclusions, or make inferences about events, the researcher must
organize the data in some meaningful way. The most convenient method of organizing data is
to construct a frequency distribution.
❑ After organizing the data, the researcher must present them so they can be understood by
those who will benefit from reading the study. The most useful method of presenting the data
is by constructing statistical charts and graphs.
❑ There are many different types of charts and graphs, and each one has a specific purpose.
Definition
A frequency distribution is the organization of raw data in table
form, using classes and frequencies.
❑For example, it can be stated that the majority of the wealthy people in the
study are 45 years old or older.
❑The classes in this distribution are 27–35, 36–44, etc.
❑These values are called class limits.
❑The data values 27, 28, 29, 30, 31, 32, 33, 34, 35
can be tallied in the first class (here we have only
one person who is 27 years old );
❑ 36, 37, 38, 39, 40, 41, 42, 43, 44 in the second
class (3 people, ages 38,38, and 44);
❑ and so on.
Definition
For example, data such as political affiliation, religious affiliation, or major field of study would
use categorical frequency
Example 2-1
Distribution of Blood Types Twenty-five army inductees were given a blood test
to determine their blood type. The data set is
❑Step 4: Find the percentage of values in each class by using the formula
𝒇
% = × 𝟏𝟎𝟎
𝒏
where 𝒇 = frequency of the class and 𝒏 = total number of values.
❑For example, in the class of type A blood, the percentage is
𝟓
% = · 100 = 20%
𝟐𝟓
❑ Find the totals for column D (percent).
For the sample, more people have type O
blood than any other type.
❑It is a good idea to add the percent column to make sure it sums to 100%.
Chapter 2: Frequency Distributions and Graphs 13
Grouped Frequency Distributions
When the range of the data is large, the data must be grouped into classes that are
more than one unit in width, in what is called a grouped frequency distribution.
❑In this distribution, the values 58 and 64 of the first class are called class limits.
❑The lower-class limit is 58; it represents the smallest data value that can be
included in the class.
❑ The upper-class limit is 64; it represents the largest data value that can be
included in the class.
❑Find the boundaries by subtracting 0.5 from 58 (the lower class limit) and adding
0.5 to 64 (the upper class limit).
Lower limit – 0.5 = 58 − 0.5 = 57.5
Upper limit + 0.5 = 64 + 0.5 = 64.5dary
Find the midpoints of each class. Use the frequency distribution given in Example
2–4,
Example 2-2
The following data represent the record high temperatures in degrees Fahrenheit
(°F) for each of the 50 states.
Solution
Step 1 Determine the classes.
oFind the highest and lowest values: H= 134 and L=100.
oFind the range: R= H - L = 134-100 =34
oSelect the number of classes desired: In this case 7 is chosen.
oFind the width by dividing the range by the number of classes and rounding up.
𝑹 𝟑𝟒
𝐖𝐢𝐝𝐭𝐡 = = = 𝟒. 𝟗 ≈ 𝟓
𝐧𝐮𝐦𝐛𝐞𝐫 𝐨𝐟 𝐜𝐥𝐚𝐬𝐬𝐞𝐬 𝟕
oSelect a starting point : In this case 100 is used
Add the width to the lowest score taken as the starting point to get the lower limit of the next class. Keep adding until there are
7 classes, as shown, 100, 105, 110, etc.
Grouped Frequency Distributions
Solution
Step 1 Determine the classes.
oFind the upper class limits.
Subtract one unit from the lower limit of the second class to get the upper limit of the first
class. Then add the width to each upper limit to get all the upper limits.
105-1=104
The first class is 100–104, the second class is 105–109, etc.
oFind the class boundaries: Subtracting 0.5 from each lower class limit and adding 0.5 to
each upper class limit.
Grouped Frequency Distributions
The frequency distribution shows that the class 109.5–114.5 contains the largest
number of temperatures (18) followed by the class 114.5–119.5 with 13
temperatures. Hence, most of the temperatures (31) fall between 110 and 119°F.
Definition
A cumulative frequency distribution is a distribution that shows
the number of data values less than or equal to a specific value
(usually an upper boundary).
When the range of the data values is relatively small, a frequency distribution can
be constructed using single data values for each class. This type of distribution is
called an ungrouped frequency distribution.
Example 2-3
The data represent the number of hours 30 college
students said they sleep per night.
Solution
Step 1 Determine the classes.
Since the range is small (10 − 5 = 5), classes consisting of a single data value can be used. They
are 5, 6, 7, 8, 9,
and 10.
Note: If the data are continuous, class boundaries can be used. Subtract 0.5 from each class value
to get the lower class boundary, and add 0.5 to each class value to get the upper class boundary.
❑After you have organized the data into a frequency distribution, you can present
them in graphical form.
❑ The purpose of graphs in statistics is to convey the data to the viewers in
pictorial form.
❑ It is easier for most people to comprehend the meaning of data presented
graphically than data presented numerically in tables or frequency distributions.
❑The three most commonly used graphs in research are
1. The histogram.
2. The frequency polygon.
3. The cumulative frequency graph, or ogive (pronounced o-jive).
Definition
The histogram is a graph that displays the data by using contiguous
vertical bars (unless the frequency of a class is 0) of various heights
to represent the frequencies of the classes
Example 2-4
Construct a histogram to represent the data
shown for the record high temperatures for
each of the 50 states.
Solution
✓ Step 1 Draw and label the x and y axes. The x axis is always the horizontal axis, and the y
axis is always the vertical axis.
✓ Step 2 Represent the frequency on the y axis and the class boundaries on the x axis.
✓ Step 3 Using the frequencies as the heights, draw vertical bars for each class.
Definition
The frequency polygon is a graph that displays the data by using
lines that connect points plotted for the frequencies at the
midpoints of the classes. The frequencies are represented by the
heights of the points.
Example 2-5
Construct a frequency polygon to represent
the data shown for the record high
temperatures for each of the 50 states.
Solution
Step 1 Find the midpoints of each class. Recall that midpoints are found by adding the
upper and lower boundaries and dividing by 2:
𝟗𝟗. 𝟓 + 𝟏𝟎𝟒. 𝟓 𝟏𝟎𝟒. 𝟓 + 𝟏𝟎𝟗. 𝟓
= 𝟏𝟎𝟐 = 𝟏𝟎𝟕
𝟐 𝟐
Solution
✓ Step 2 Draw the x and y axes. Label the x axis with the midpoint of each class, and then
use a suitable scale on the y axis for the frequencies.
✓ Step 3 Using the midpoints for the x values and the frequencies as the y values, plot
the points.
✓ Step 4 Connect adjacent points with line segments. Draw a line back to the x axis at the
beginning and end of the graph, at the same distance that the previous and next
midpoints would be located
Definition
The ogive is a graph that represents the cumulative frequencies for
the classes in a frequency distribution.
Example 2-6
Construct an Ogive for the frequency
distribution of the data of the record high
temperatures.
Solution
Step 1 Find the cumulative frequency for each class.
Solution
✓ Step 2 Draw the x and y axes. Label the x axis with the class boundaries, and then use a
✓ Step 4 Starting with the first upper class boundary, 104.5, connect adjacent points with
line segments. Then extend the graph to the first lower class boundary, 99.5, on the x
axis.
Chapter 2: Frequency Distributions and Graphs 41
The Olive
several other types of graphs are often used in statistics. They are the
bar graph, Pareto chart, time Series graph, pie graph, and the dotplot.
Bar Graphs
When the data are qualitative or categorical, bar graphs can be used to represent
the data. A bar graph can be drawn using either horizontal or vertical bars.
Definition
A bar graph represents the data by using vertical or horizontal bars
whose heights or lengths represent the frequencies of the data.
1. Draw and label the x and y axes. For the horizontal bar graph place the
frequency scale on the x axis, and for the vertical bar graph place the
frequency scale on the y axis.
2. Draw the bars corresponding to the frequencies.
❑ Bar graphs can also be used to compare data for two or more groups. These
types of bar graphs are called compound bar graphs.
❑ Consider the following data for the number (in millions) of never married
adults in the United States.
Definition
A pie graph is a circle that is divided into sections or wedges
according to the Percentage of frequencies in each category of the
distribution.
Example 2-11
❑ This frequency distribution shows the
number of pounds of each snack food
eaten during the Super Bowl.
❑ Construct a pie graph for the data..
Example 2-1
Distribution of Blood Types Twenty-five army inductees were given a blood test
to determine their blood type. The data set is
5. Select cell C2. From the toolbar, select the Formulas tab on the toolbar.
6. Select the Insert Function icon , then select the Statistical category in the
Insert Function dialog box.
7. Select the Countif function from the function name list.
8. In the dialog box, type A1:A25 in the Range box. Type in the blood type “A” in
quotes in the Criteria box. (=COUNTIF(A1:A25,"A"))
9. The count or frequency of the number of data corresponding to the blood
type should appear below the input. Repeat for the remaining blood types.
Categorical Frequency Table
9. After all the data have been counted, select cell C6 in the worksheet.
10. From the toolbar select Formulas, then AutoSum and type in C2:C5 to insert
the total frequency into cell C6.
Grouped Frequency Distributions
Example 2-2
The following data represent the record high temperatures in degrees Fahrenheit
(°F) for each of the 50 states.
6. In the Histogram dialog box, type A1:A50 in the Input Range box and type B2:B8
in the Bin Range box.
7. Select New Worksheet Ply, and check the Cumulative Percentage option. Click
[OK].
6. In the Histogram dialog box, type A1:A50 in the Input Range box and type B2:B8
in the Bin Range box.
7. Select New Worksheet Ply, and check the Chart output option. Click [OK].
We will need to edit the graph so that the midpoints are on the horizontal axis.
1. Right click the mouse on any region of the
chart.
2. Choose Select Data.
3. Select Edit below the Horizontal (Category) Axis
Labels panel on the right.
4. Press and hold the left mouse button, and drag
over the midpoints (not including the label) for
the Axis label range, then click [OK].
5. Click [OK] on the Select Data Source box.
Chapter 2: Frequency Distributions and Graphs 62
Constructing a Ogive
To create an ogive, use the upper class boundaries (horizontal axis) and cumulative frequencies
(vertical axis) from the frequency distribution.
1. Type the upper class boundaries (including a class with frequency 0
before the lowest class to anchor the graph to the horizontal axis)
and corresponding cumulative frequencies into adjacent columns of
an Excel worksheet.
2. Press and hold the left mouse button, and drag over the Cumulative
Frequencies from column B.
Example 2-12
Construct a pie chart for the calls received each
shift by a local municipality for a recent year
using Excel.
Solution
1. Enter the shifts from the data into
column A of a new worksheet.
2. Enter the frequencies corresponding
to each shift in column B.
3. Highlight the data in columns A and B
and select Insert from the toolbar;
then select the Pie chart type.
Chapter 2: Frequency Distributions and Graphs 65
Construct a Bar Chart
Example 2-12
Construct a bar chart for the calls received each
shift by a local municipality for a recent year
using Excel.
Solution
1. Highlight the data in
columns A and B and select
Insert from the toolbar;
then select the cluster
column or cluster Bar type.