Describing Data New
Describing Data New
Frequency Table
Frequency Distribution
A Frequency distribution is a
grouping of data into mutually
exclusive categories showing the
number of observations in each
class. The table shows a
frequency distribution for a set of
quantitative data.
• In this distribution, the values 24 and 30 of the first class are called “class limits”.
• 24 is the “lower class limit” and 30 is the “upper class limit.”
• The numbers in the second column are called class boundaries.
• The class boundaries are used to separate the class so that there is no gap in
frequency distribution.
• Lower boundary= lower limit - 0.5
• Upper boundary= upper limit + 0.5
❑ Class limits should have the same decimal place value as the data, but the class
boundaries should have one additional place value and end in a 5.
-6-
For example: Class limit 7.8-8.8 Class boundary 7.75-8.85
➢ Lower boundary= lower limit - 0.05 =7.8- 0.05 =7.75
➢ Upper boundary= upper limit + 0.05 =8.8+0.05=8.85
For example:
Rules for Classes in Grouped Frequency Distributions
1. There should be 5-20 classes.
2. The class width should be an odd number.
3. The classes must be mutually exclusive.
-7-
c Class interval c
a) Inclusive Class Interval: b) Exclusive Class Interval:
When the lower and the upper-class When the lower limit is included, but the upper
limit is included limit is excluded
Eg: class A 0-9 Eg: class A 0-10
B 10-19 B 10-20
C 20-29 C 20-30
For Class B, Lower class boundary=9.5 For Class B,
Upper class boundary=19.5 Lower class limit (LCL)=10
Inclusive class interval= Upper class limit (UCL)=20(not included)
Upper class boundary-lower class Exclusive class interval=UCL-LCL =20−10=10
boundary =19.5−9.5=10=19.5−9.5=10
Hint:
Grouped Frequency Distribution
• Group by class intervals
• report frequency for each interval
• Lose information: no exact values
• General rules
• each interval same width
• consecutive & do not overlap
Distributions as graphs
• Summarizes data
• focus on clear communication
• Bar Graphs
• nominal or ordinal data
• discrete variables
• Histograms & Frequency Polygons
• Interval/ratio data
• continuous & discrete variables
• Summarizes important characteristics of data
1. What is shape of the distribution?
2. Where is middle of distribution?
3. How wide is distribution?
Shapes of distributions
• Unimodal distribution
• single value is most frequent
• Bimodal (or multimodal )
• 2 most frequently occurring values
• May indicate relevant subgroups
-8-
Shape
• A graph shows the shape of the distribution.
• A distribution is symmetrical if the left side of the graph is (roughly) a mirror image
of the right side.
• One example of a symmetrical distribution is the bell-shaped normal distribution.
• On the other hand, distributions are skewed
when scores pile up on one side of the
distribution, leaving a "tail" of a few extreme
values on the other side.
Positively and Negatively Skewed Distributions
• In a positively skewed distribution, the scores
tend to pile up on the left side of the distribution
with the tail tapering off to the right.
• In a negatively skewed distribution, the scores tend to pile up on the right side and
the tail points to the left.
SHAPES of Frequency Curves
Shapes of Distributions
• J shaped:few data values on left side and increases as one moves to right
Reverse J shaped: opposite of the j-shaped distribution
-9-
Constructing a Frequency Table – Example
⚫ Step 1: Decide on the number of classes.
A useful recipe to determine the number of classes (k) is the “2 to the k rule.” such
that 2k > n.
There were 80 vehicles sold. So n = 80. If we try k = 6, which means we would use 6
classes, then 26 = 64, somewhat less than 80. Hence, 6 is not enough classes. If we let
k = 7, then 27 128, which is greater than 80. So the recommended number of classes
is 7.
⚫ Step 2: Determine the class interval or width.
The formula is: i (H-L)/k where i is the class interval, H is the highest observed
value, L is the lowest observed value, and k is the number of classes.
($35,925 - $15,546)/7 = $2,911
Round up to some convenient number, such as a multiple of 10 or 100. Use a class
width of $3,000
⚫ Step 3: Set the individual class limits
-10-
Graph of a Frequency Distribution, for Quantitative Data
The three commonly used graphic forms are:
⚫ Histograms
⚫ Frequency polygons
⚫ Cumulative frequency distributions
Histogram
Frequency Polygon
is a line graph that displays the cumulative frequency of each class at its upper class
boundary.
-11-
Line Graph
-12-
Line Plot
Box Plot
-13-
Scatter Plot
No Correlation
If there is absolutely no correlation
present, the value given is 0.
-14-
Strong linear correlation:
The closer the number is to 1 or -1, the stronger the correlation, or the stronger the
relationship between the variables.
-15-
Pie (circle) charts - more info
⚫ A way of summarizing a set of categorical data or displaying the different values
of a given variable (e.g. percentage distribution).
⚫ A circle is divided into a series of segments. Each segment represents a particular
category.
⚫ The area of each segment is the same proportion of a circle’s area as the category
is of the total data set.
⚫ Quite popular. Circle provides a visual concept of the whole (100%).
⚫ Best used for displaying statistical information when there are no more than six
components – otherwise, the resulting picture will be too complex to
understand.
⚫ Pie charts are not useful when the values of each component are similar because
it is difficult to see the differences between slice sizes.
⚫ A pie graph is a circle that is divided into sections or wedges according to the
percentage of frequencies in each category of the distribution.
⚫ The purpose of the pie graph is to show the relationship of the parts to the whole
by visually comparing the sizes of the sections.
⚫ Percentages or proportions can be used.
⚫ The variable is nominal or categorical.
A Pareto charts
A Pareto chart is used to represent a frequency distribution
for a categorical variable, and the frequencies are displayed
by the heights of vertical bars, which are arranged in order
from highest to lowest.
Pareto chart When the variable displayed on
The horizontal axis is qualitative or categorical,
A Pareto chart can be used
-16-
Bar Charts (Used with Qualitative Data)
Multi-Bar Graph
• A bar graph represents the data by using vertical or horizontal bars whose heights
or lengths represent the frequencies of the data
• When the data are qualitative or categorical, bar graphs can be used
-17-