Unit 2 - Summarizing Data - Charts and Tables
Unit 2 - Summarizing Data - Charts and Tables
Introductory Statistics
Unit 2
Summarizing Tables and
Charts
Learning Objectives
1. Compare the use of bar charts and histograms in presenting data.
2. Explain the associated concepts of class, class limits, class marks and
class width.
3. Discuss the use of frequency charts including the ogive; Create and
interpret histograms, frequency charts, stem and leaf displays, box
plots and scatter-graphs using MINITAB.
Introduction
• Raw data is not very informative.
• Frequency distribution tables can be used for both categorical and numeric
variables. Continuous variables should only be used with class intervals.
Summarizing Data: Frequency Distributions
Constructing a group frequency table:
• Class Marks- They are the midpoints of the classes. They are
obtained by averaging the limits.
Lower Limit
Upper Limit
Summarizing Data: Frequency Distributions (Examples)
Relative Frequency Table
Cumulative Frequency
Presenting Charts: Bar Chart
• Used to summarize categorical data that are usually measured on a nominal
or a ordinal scale.
• There is usually a space between two consecutive bars in a simple bar chart
which reflects a frequency distribution in some instances. Such a space
does not exist in the context of a histogram.
Presenting Charts: Bar Chart
• Some bar charts reflect other descriptive statistics such as counts (absolute
frequencies), proportions that are indicative of incidence, prevalence,
exposure, etc, and means.
• Disadvantages
1. Can be easily manipulated to yield false impressions.
2. Fails to reveal key assumptions, causes, effects, or patterns.
Presenting Charts: Bar Chart (Example)
Presenting Charts: Bar Chart (Interpretation)
• The bar chart shows that women more than men are likely to
give greeting cards during valentines day.
• There are no spaces between categories as there are in bar charts for category data.
• A histogram can assume any one of a large number of shapes. The most common of
these shapes are:
1. Symmetric
2. Skewed
3. Uniform or rectangular
Presenting Charts: Histogram
Advantages
1. Histogram provide a way to display the frequency of occurrences of data
along an interval .
2. Histograms are useful and easy, they apply to both continuous and
discrete.
3. Histograms are generally used when dealing with large data sets.
4. They can be used to detect any unusual observation (outlier) or any gaps
in the dataset.
Disadvantages
5. Cannot read exact values since data is grouped into categories.
6. The use of intervals prevents the calculation of an exact measure of
central tendency.
7. It is more difficult to compare two data sets. difficult to present several
at a time for comparison.
Presenting Charts: Histogram (Example)
Presenting Charts: Histogram (Interpretation)
• The histograms shows the frequency of person attending
religious services . From the histogram one can deduce that
most person (i.e. 125 persons) within the sample attend
religious services several time a year.
• The histogram show that few persons within the sample (i.e.
8 person) attend religious services several times for the
week.
• To draw the ogive graph on the previous slide, the variable, which is total
iPods sold, is marked on the horizontal axis and the cumulative frequencies
on the vertical axis.
• Then the dots are marked above the upper boundaries of various classes at
the heights equal to the corresponding cumulative frequencies.
• Note that the ogive starts at the lower boundary of the first class and ends at
the upper boundary of the last class.
Presenting Charts: Frequency Charts
Table showing the frequency of persons attending religious services (MINITAB output)
Presenting Charts: Frequency Charts
• The frequency chart or table shows the frequency of person attending
religious services . From the table one can deduce that most person (125
persons) within the sample attend religious services several time a year,
that is 19.7% of the sample population.
• The table also shows that few persons within the sample (8 person)
attend religious services several times for the week.
• The second largest quantity of persons i.e. 16.6% attend religious service
about once or twice a year.
• It is a way of summarizing a set of data measured on the interval scale. It is often used in
explanatory data analysis to illustrate the major features of the distribution of data in a
convenient and easily drawn form.
Advantages:
1. Easy to construct and make observations
2. It a more informative display for relatively small data sets (<100 data points).
3. Can be used to compare more the one data set.
4. Stem and Leaf plot can be used to determine measures of central tendency such as
the mean, median and mode.
Disadvantages
5. Useful for describing small data sets.
Presenting Charts: Stem & Leaf Displays
• A MINITAB stem plot for this data (created using the "STEM" command) is shown
below. MINITAB first truncates the data by rounding down to integers, then sorts
the data. The resulting dataset is the following:
1, 4, 5, 8, 9, 9, 10, 10, 10, 10, 12, 12, 13, 13, 13, 14, 14, 15, 15, 15, 16, 16, 17, 18,
23.
Presenting Charts: Stem & Leaf Displays
• N represent the sample size i.e. 25 and the leaf unit tells us the value of each leaf i.e. 1.0
.
• The first column of the MINITAB stem and leaf plot shows the frequency or the
cumulative frequency. It shows the number of values from the top down and from the
bottom up to the middle value (the median). The number in brackets i.e. (5) represents
the count of values in the row containing the median, which is the thirteenth ordered
value in this example, 13.0.
• The second column plots the stems, the values for the stems are 0, 1, and 2. The third
column, plots the leaves.
• The stem plot illustrates that the majority of the measurements lie in the teens, with
only 6 of the 25 values less than 10 and only 1 value greater than 20.
• For the first column for instance in the stem and leaf plot shows the value of 1 for the
frequency of one. Also to in the 7th row for instance the values are 12, 12, 13, 13, 13.
Presenting Charts: Box Plots
• A box plot is a way of summarizing a set of data measured on an interval
scale. It is often used in exploratory data analysis.
• The box plot usually consists of the most extreme values in the data set
(maximum and minimum values), the lower and upper quartiles and the
median.
Presenting Charts: Box Plots
• This MINITAB box plots represent lottery payoffs for winning numbers for three
time periods (May 1975-March 1976, November 1976-September 1977, and
December 1980-September 1981).
Presenting Charts: Box Plots
• The median for each dataset is indicated by the black center line, and the first and
third quartiles are the edges of the red area, which is known as the inter-quartile
range (IQR).
• The extreme values (within 1.5 times the inter-quartile range from the upper or
lower quartile) are the ends of the lines extending from the IQR.
• Points at a greater distance from the median than 1.5 times the IQR are plotted
individually as asterisks. These points represent potential outliers.
• In this example, the three box plots have nearly identical median values. The IQR is
decreasing from one time period to the next, indicating reduced variability of payoffs
in the second and third periods. In addition, the extreme values are closer to the
median in the later time periods.
Presenting Charts: Scatter Plot
• A scatter plot is a useful summary of a set of bivariate data (two
variables), usually drawn before working out a linear correlation
coefficient or fitting a regression line. It gives a good visual picture of the
relationship between the two variables, and aids the interpretation of the
correlation coefficient or regression model.
• Each unit contributes one point to the scatter plot, on which points are
plotted but not joined. The resulting pattern indicates the type and
strength of the relationship between the two variables.
• Or, there might not be any notable association, in which case a scatter
plot would not indicate any trends whatsoever.
Presenting Charts: Scatter Plot
• This MINITAB scatter plot displays the association between the size of a diamond (in carats)
and its retail price (in Singapore dollars) for 48 observations.
Diagram 1 Diagram 2
Presenting Charts: Scatter Plot
• The scatter plot clearly indicates that there is a positive association between size
and price. (See diagram 1)
• A median trace plot clarifies the positive association between size and price. To
create this plot, the horizontal axis (size) is divided into equally spaced segments,
and the median of the corresponding y-values (price) is plotted above the
midpoint of each segment. The points are connected to form the median trace.
(See diagram 2)