0% found this document useful (0 votes)
18 views

Describing Data New

Uploaded by

ashraf helmy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Describing Data New

Uploaded by

ashraf helmy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Describing Data:

Frequency Tables, Frequency Distributions, and Graphic Presentation


• Data are the facts and figures collected, summarized, analyzed, and interpreted.
• The data collected in a particular study are referred to as the data set.
• The elements are the entities on which data are collected.
• A variable is a characteristic of interest for the elements
• The set of measurements collected for a particular element is called an observation.
• The total number of data values in a complete data set is the number of elements
multiplied by the number of variables.
• This is the goal of After collecting data,
descriptive statistical the first task for a
techniques. researcher is to
organize and simplify
• One method for simplifying
the data so that it is
and organizing data is to
construct a frequency
possible to get a
distribution. general overview of
the results.

Frequency Table

Frequency Distribution
A Frequency distribution is a
grouping of data into mutually
exclusive categories showing the
number of observations in each
class. The table shows a
frequency distribution for a set of
quantitative data.

Frequency Distribution (Quantitative Data)


• Class midpoint: A point that divides a class into two equal parts. This is the
average of the upper- and lower-class limits.
• Class frequency: The number of observations in each class.
• Class interval: The class interval is obtained by subtracting the lower limit of a
class from the lower limit of the next class.
-5--6-
Relative Class Frequencies
⚫ Class frequencies can be converted to relative
class frequencies to show the fraction of the
total number of observations in each class.
⚫ A relative frequency captures the relationship
between a class total and the total number of
observations.
Relative Frequency Distribution
To convert a frequency
distribution to a relative
frequency distribution, each of
the class frequencies is divided
by the total number of
observations
Grouped Frequency Distributions
❑ Class limits ❑ Class boundaries
❑ Lower class limit ❑ Upper class boundaries
❑ Upper class limit ❑ Lower class boundaries

• In this distribution, the values 24 and 30 of the first class are called “class limits”.
• 24 is the “lower class limit” and 30 is the “upper class limit.”
• The numbers in the second column are called class boundaries.
• The class boundaries are used to separate the class so that there is no gap in
frequency distribution.
• Lower boundary= lower limit - 0.5
• Upper boundary= upper limit + 0.5
❑ Class limits should have the same decimal place value as the data, but the class
boundaries should have one additional place value and end in a 5.

-6-
For example: Class limit 7.8-8.8 Class boundary 7.75-8.85
➢ Lower boundary= lower limit - 0.05 =7.8- 0.05 =7.75
➢ Upper boundary= upper limit + 0.05 =8.8+0.05=8.85

• The lower-class limit represents the smallest


data value that can be included in the class.
• The upper-class limit represents the largest
data value that can be included in the class.
❑ The numbers are used to separate the classes so that there are no gaps in the
frequency distribution called class boundaries
❑ The class width is found by subtracting the lower (or upper) class limit of one
class from the lower (or upper) class limit of the next class .
Class width=lower of second-class limit-lower of first-class limit
Class width=upper of first-class boundary -lower of first-class boundary
For example:

class width: 31-24 = 7

❑ The class midpoint Xm


𝒍𝒐𝒘𝒆𝒓 𝒍𝒊𝒎𝒊𝒕 + 𝒖𝒑𝒑𝒆𝒓 𝒍𝒊𝒎𝒊𝒕 𝒍𝒐𝒘𝒆𝒓 𝒃𝒐𝒖𝒏𝒅𝒂𝒓𝒚 + 𝒖𝒑𝒑𝒆𝒓 𝒃𝒐𝒏𝒅𝒂𝒓𝒚
𝒙𝒎 = 𝑶𝑹 𝒙𝒎 =
𝟐 𝟐

For example:
Rules for Classes in Grouped Frequency Distributions
1. There should be 5-20 classes.
2. The class width should be an odd number.
3. The classes must be mutually exclusive.

4. The classes must be continuous.


5. The classes must be exhaustive.
6. The classes must be equal in width (except in open-ended distributions).

-7-
c Class interval c
a) Inclusive Class Interval: b) Exclusive Class Interval:
When the lower and the upper-class When the lower limit is included, but the upper
limit is included limit is excluded
Eg: class A 0-9 Eg: class A 0-10
B 10-19 B 10-20
C 20-29 C 20-30
For Class B, Lower class boundary=9.5 For Class B,
Upper class boundary=19.5 Lower class limit (LCL)=10
Inclusive class interval= Upper class limit (UCL)=20(not included)
Upper class boundary-lower class Exclusive class interval=UCL-LCL =20−10=10
boundary =19.5−9.5=10=19.5−9.5=10
Hint:
Grouped Frequency Distribution
• Group by class intervals
• report frequency for each interval
• Lose information: no exact values
• General rules
• each interval same width
• consecutive & do not overlap
Distributions as graphs
• Summarizes data
• focus on clear communication
• Bar Graphs
• nominal or ordinal data
• discrete variables
• Histograms & Frequency Polygons
• Interval/ratio data
• continuous & discrete variables
• Summarizes important characteristics of data
1. What is shape of the distribution?
2. Where is middle of distribution?
3. How wide is distribution?
Shapes of distributions

• Unimodal distribution
• single value is most frequent
• Bimodal (or multimodal )
• 2 most frequently occurring values
• May indicate relevant subgroups

-8-
Shape
• A graph shows the shape of the distribution.
• A distribution is symmetrical if the left side of the graph is (roughly) a mirror image
of the right side.
• One example of a symmetrical distribution is the bell-shaped normal distribution.
• On the other hand, distributions are skewed
when scores pile up on one side of the
distribution, leaving a "tail" of a few extreme
values on the other side.
Positively and Negatively Skewed Distributions
• In a positively skewed distribution, the scores
tend to pile up on the left side of the distribution
with the tail tapering off to the right.
• In a negatively skewed distribution, the scores tend to pile up on the right side and
the tail points to the left.
SHAPES of Frequency Curves

Shapes of Distributions

• J shaped:few data values on left side and increases as one moves to right
Reverse J shaped: opposite of the j-shaped distribution

-9-
Constructing a Frequency Table – Example
⚫ Step 1: Decide on the number of classes.
A useful recipe to determine the number of classes (k) is the “2 to the k rule.” such
that 2k > n.
There were 80 vehicles sold. So n = 80. If we try k = 6, which means we would use 6
classes, then 26 = 64, somewhat less than 80. Hence, 6 is not enough classes. If we let
k = 7, then 27 128, which is greater than 80. So the recommended number of classes
is 7.
⚫ Step 2: Determine the class interval or width.
The formula is: i  (H-L)/k where i is the class interval, H is the highest observed
value, L is the lowest observed value, and k is the number of classes.
($35,925 - $15,546)/7 = $2,911
Round up to some convenient number, such as a multiple of 10 or 100. Use a class
width of $3,000
⚫ Step 3: Set the individual class limits

-10-
Graph of a Frequency Distribution, for Quantitative Data
The three commonly used graphic forms are:
⚫ Histograms
⚫ Frequency polygons
⚫ Cumulative frequency distributions
Histogram

Histogram for a frequency distribution


based on quantitative data is very similar
to the bar chart showing the distribution of
qualitative data. The classes are marked on
the horizontal axis and the class
frequencies on the vertical axis. The class
frequencies are represented by the heights
of the bars.

Frequency Polygon

A frequency polygon also shows the


shape of a distribution and is similar to a
histogram.
It consists of line segments connecting
the points formed by the intersections of
the class midpoints and the class
frequencies.

Cumulative Frequency Distribution


or ogive

is a line graph that displays the cumulative frequency of each class at its upper class
boundary.

-11-
Line Graph

• A time series graph represents data that


occur over a specific period of time.
• When data are collected over a period of
time, they can be represented by a time
series graph
• (Line chart) Compound time series graph:
when two data sets are compared on the
same graph

Stem and Leaf Plot

-12-
Line Plot

Box Plot

Box plot A advantages Disadvantages


• A box plot is a concise • Shows 5-point • Not as visually
graph Showing the summary and outliers appealing as other
five-point summary. • Easily compares two graphs
• Multiple box plots can or more data sets • Exact values are not
be drawn side by Side • Handles extremely retained
to compare more than large data sets easily
one data set

-13-
Scatter Plot

No Correlation
If there is absolutely no correlation
present, the value given is 0.

Perfect linear correlation:

A perfect positive correlation is given the value of 1.

A perfect negative correlation is given the value of -1.

-14-
Strong linear correlation:
The closer the number is to 1 or -1, the stronger the correlation, or the stronger the
relationship between the variables.

Weak linear correlation:


The closer the number is to 0, the weaker the correlation.

Pie Charts (Used with Qualitative Data)

-15-
Pie (circle) charts - more info
⚫ A way of summarizing a set of categorical data or displaying the different values
of a given variable (e.g. percentage distribution).
⚫ A circle is divided into a series of segments. Each segment represents a particular
category.
⚫ The area of each segment is the same proportion of a circle’s area as the category
is of the total data set.
⚫ Quite popular. Circle provides a visual concept of the whole (100%).
⚫ Best used for displaying statistical information when there are no more than six
components – otherwise, the resulting picture will be too complex to
understand.
⚫ Pie charts are not useful when the values of each component are similar because
it is difficult to see the differences between slice sizes.
⚫ A pie graph is a circle that is divided into sections or wedges according to the
percentage of frequencies in each category of the distribution.
⚫ The purpose of the pie graph is to show the relationship of the parts to the whole
by visually comparing the sizes of the sections.
⚫ Percentages or proportions can be used.
⚫ The variable is nominal or categorical.

A Pareto charts
A Pareto chart is used to represent a frequency distribution
for a categorical variable, and the frequencies are displayed
by the heights of vertical bars, which are arranged in order
from highest to lowest.
Pareto chart When the variable displayed on
The horizontal axis is qualitative or categorical,
A Pareto chart can be used

-16-
Bar Charts (Used with Qualitative Data)

• A bar graph uses to show data


• The data can be in words or numbers
• The bar can be vertical ( up and down) or horizontal (across)
• Vertical Bar Graph: Displays data better than horizontal bar graphs,
and is preferred when possible.
• Horizontal Bar Graph: Useful when category names are too long to
fit at the foot of a column.
Double Bar Graph

Multi-Bar Graph

• A bar graph represents the data by using vertical or horizontal bars whose heights
or lengths represent the frequencies of the data
• When the data are qualitative or categorical, bar graphs can be used

-17-

You might also like