0% found this document useful (0 votes)
23 views59 pages

BIOL 2163 Lecture 2 - Summarizing and Graphing Data

Uploaded by

Zara16
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views59 pages

BIOL 2163 Lecture 2 - Summarizing and Graphing Data

Uploaded by

Zara16
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Chapter 2

Summarizing and Graphing Data

2-1 Overview
2-2 Frequency Distributions
2-3 Visualizing Data - Histograms
2-3 Visualizing Data - Statistical Graphics
2-3 Supplemental - Critical Thinking: Bad Graphs
Section 2-1
Overview

Copyright © 2010, 2007, 2004


Pearson Education, Inc. All
Rights Reserved.
Overview
Descriptive vs. Inferential Statistics

Descriptive Statistics: Methods used to


summarize or describe the important
characteristics of a set of data.
Inferential Statistics: Methods that use sample
data to make inferences about a population

In this lecture we look at descriptive statistics.


Overview
Important Characteristics of Data
1. Center: A representative or average value that indicates
where the middle of the data set is located.
2. Variation: A measure of the amount that the data values vary.
3. Distribution: The nature or shape of the spread of data over
the range of values (such as bell-shaped, uniform, or skewed).
4. Outliers: Sample values that
90
lie very far away from the 80
vast majority of other sample 70
60
values. 50 East
40 West
5. Time: Changing 30 North
20
characteristics of the data 10
over time. 0
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
Section 2-2
Frequency Distributions

Copyright © 2010, 2007, 2004


Pearson Education, Inc. All
Rights Reserved.
Copyright © 2010 Pearson Education
Key Concept

When working with large data sets, it is often helpful


to organize and summarize data by constructing a
table called a frequency distribution, defined later.
Because computer software and calculators can
generate frequency distributions, the details of
constructing them are not as important as what they
tell us about data sets. It helps us understand the
nature of the distribution of a data set.
Definition

 Frequency Distribution
(or Frequency Table)
shows how a data set is partitioned among several
categories (or classes) by listing all of the categories
along with the number of data values in each of the
categories.
Pulse Rates of Females and Males

Original Data – Appendix B, Data Set 1


Frequency Distribution
Pulse Rates of Females

The frequency for a


particular class is the
number of original values
that fall into that class.
Frequency Distributions

Definitions
Lower Class Limits
are the smallest numbers that can actually belong to
different classes

Lower Class
Limits
Upper Class Limits
are the largest numbers that can actually
belong to different classes

Upper Class
Limits
Class Boundaries
are the numbers used to separate classes, but
without the gaps created by class limits

59.5
69.5
79.5
Class 89.5
Boundaries 99.5
109.5
119.5
129.5
Class Midpoints
are the values in the middle of the classes and can be found by adding the lower class
limit to the upper class limit and dividing the sum by two

64.5
74.5
84.5
Class
94.5
Midpoints
104.5
114.5
124.5
Class Width
is the difference between two consecutive lower
class limits or two consecutive
lower class boundaries

10
10
Class
10
Width 10
10
10
Reasons for Constructing
Frequency Distributions
1. Large data sets can be summarized.

2. We can analyze the nature of data.

3. We have a basis for constructing important


graphs.
Constructing A Frequency Distribution
1. Determine the number of classes (should be between 5 and 20).

2. Calculate the class width (round up).

(maximum value) – (minimum value)


class width  number of classes

3. Starting point: Choose the minimum data value or a convenient value


below it as the first lower class limit.
4. Using the first lower class limit and class width, proceed to list the other
lower class limits.
5. List the lower class limits in a vertical column and proceed to enter the
upper class limits.
6. Take each individual data value and put a tally mark in the appropriate
class. Add the tally marks to get the frequency.
Relative Frequency Distribution

includes the same class limits as a frequency


distribution, but the frequency of a class is
replaced with a relative frequencies (a
proportion) or a percentage frequency ( a
percent)
class frequency
relative frequency =
sum of all frequencies

percentage class frequency


=  100%
frequency sum of all frequencies
Relative Frequency Distribution

Total Frequency = 40 * 12/40  100 = 30%


Cumulative Frequency Distribution
The cumulative frequency for a class is the
sum of the frequencies for that class and all
previous classes.

Cumulative Frequencies
Frequency Tables
Critical Thinking: Interpreting
Frequency Distributions
In later chapters, there will be frequent reference to data
with a normal distribution. One key characteristic of a
normal distribution is that it has a “bell” shape.

 The frequencies start low, then increase to one or two


high frequencies, then decrease to a low frequency.

 The distribution is approximately symmetric, with


frequencies preceding the maximum being roughly a
mirror image of those that follow the maximum.
The Normal Distribution
Gaps

Gaps
The presence of gaps can show that we have
data from two or more different populations.

 However, the converse is not true, because data


from different populations do not necessarily
result in gaps.
Recap

In this Section we have discussed


 Important characteristics of data
 Frequency distributions
 Procedures for constructing frequency distributions
 Relative frequency distributions
 Cumulative frequency distributions
Section 2-3
Visualizing Data - Histograms

Copyright © 2010, 2007, 2004


Pearson Education, Inc. All
Rights Reserved.
Copyright © 2010 Pearson Education
Key Concept

We use a visual tool called a


histogram to analyze the shape of
the distribution of the data.
Histogram

A graph consisting of bars of equal width drawn


adjacent to each other (without gaps). The
horizontal scale represents the classes of
quantitative data values and the vertical scale
represents the frequencies. The heights of the bars
correspond to the frequency values.
Histogram
Basically a graphic version of a frequency
distribution.
Histogram
The bars on the horizontal scale are labeled with
one of the following:
(1) Class boundaries
(2) Class midpoints
(3) Lower class limits (introduces a small error)

Horizontal Scale for Histogram: Use class


boundaries or class midpoints.
Vertical Scale for Histogram: Use the class
frequencies.
Relative Frequency Histogram
Has the same shape and horizontal scale as a histogram, but the vertical scale is
marked with relative frequencies instead of actual frequencies
Critical Thinking
Interpreting Histograms
Objective is not simply to construct a histogram, but rather
to understand something about the data.

When graphed, a normal distribution has a “bell” shape.


Characteristics of the bell shape are

(1) The frequencies increase to a maximum, and then


decrease, and

(2) symmetry, with the left half of the graph roughly a


mirror image of the right half.

The histogram on the next slide illustrates this.


Critical Thinking
Interpreting Histograms
Recap

In this Section we have discussed


 Histograms
 Relative Frequency Histograms
Section 2-3 Continued
Visualizing Data - Statistical Graphics

Copyright © 2010, 2007, 2004


Pearson Education, Inc. All
Rights Reserved.
Key Concept

This section discusses other types of statistical


graphs.
Our objective is to identify a suitable graph for
representing the data set. The graph should be
effective in revealing the important
characteristics of the data.
Frequency Polygon
Uses line segments connected to points directly above class midpoint values
Relative Frequency Polygon
Uses relative frequencies (proportions or percentages) for the vertical scale.
Cumulative Frequency Graph (or
Ogive)
A line graph that depicts cumulative frequencies
Dot Plot
Consists of a graph in which each data value is plotted as a point (or dot) along a
scale of values. Dots representing equal values are stacked.
Stemplot (or Stem-and-Leaf Plot)
Represents quantitative data by separating each value into two parts: the stem
(such as the leftmost digit) and the leaf (such as the rightmost digit)

Pulse Rates of Females


Bar Graph

Uses bars of equal width to show frequencies of


categories of qualitative data. Vertical scale
represents frequencies or relative frequencies.
Horizontal scale identifies the different categories
of qualitative data.
A multiple bar graph has two or more sets of bars,
and is used to compare two or more data sets.
Multiple Bar Graph
Median Income of Males and Females
Pareto Chart
A bar graph for qualitative data, with the bars arranged in descending order
according to frequencies
Pie Chart
A graph depicting qualitative data as slices of a circle, size of slice is proportional
to frequency count
Scatter Plot (or Scatter Diagram)
A plot of paired (x,y) data with a horizontal x-axis and a vertical y-axis. Used to
determine whether there is a relationship between the two continuous variables
Time-Series Graph
Data that have been collected at different points in time: time-series data
Important Principles
Suggested by Edward Tufte
For small data sets of 20 values or fewer, use a table
instead of a graph.
A graph of data should make the viewer focus on
the true nature of the data, not on other elements,
such as eye-catching but distracting design features.
Do not distort data, construct a graph to reveal the
true nature of the data.
Almost all of the ink in a graph should be used for
the data, not the other design elements.
Important Principles
Suggested by Edward Tufte
Don’t use screening consisting of features such as
slanted lines, dots, cross-hatching, because they
create the uncomfortable illusion of movement.
Don’t use area or volumes for data that are actually
one-dimensional in nature. (Don’t use drawings of
dollar bills to represent budget amounts for
different years.)
Never publish pie charts, because they waste ink on
nondata components, and they lack appropriate
scale.
Car Reliability Data
Recap
In this section we saw that graphs are excellent
tools for describing, exploring and comparing data.
Describing data: Histogram - consider distribution,
center, variation, and outliers.
Exploring data: features that reveal some useful
and/or interesting characteristic of the data set.
Comparing data: Construct similar graphs to
compare data sets.
Section 2-3 Supplemental
Critical Thinking: Bad Graphs

Copyright © 2010, 2007, 2004


Pearson Education, Inc. All
Rights Reserved.
Key Concept

Some graphs are bad in the sense that they


contain errors.
Some are bad because they are technically
correct, but misleading.
It is important to develop the ability to
recognize bad graphs and identify exactly how
they are misleading.
Nonzero Axis
Are misleading because one or both of the axes begin at some value other than
zero, so that differences are exaggerated.
Pictographs
Pictographs are drawings of objects. Three-dimensional
objects - money bags, stacks of coins, army tanks (for army
expenditures), people (for population sizes), barrels (for oil
production), and houses (for home construction) are
commonly used to depict data.
These drawings can create false impressions that distort the
data.
If you double each side of a square, the area does not
merely double; it increases by a factor of four; if you double
each side of a cube, the volume does not merely double; it
increases by a factor of eight.
Pictographs using areas and volumes can therefore be very
misleading.
Annual Incomes of Groups with Different
Education Levels

Bars have same width, too busy, too difficult to understand.


Annual Incomes of Groups with Different
Education Levels

Misleading. Depicts one-dimensional data with three-


dimensional boxes. Last box is 64 times as large as first box,
but income is only 4 times as large.
Annual Incomes of Groups with Different
Education Levels

Fair, objective, unencumbered by distracting features.


Daily Oil Consumption – USA vs. Japan

Part (b) is designed to exaggerate the difference by increasing each


dimension in proportion to the actual amounts of oil consumption.
Misleading . Depicts one-dimensional data with three-dimensional objects.

You might also like