100% found this document useful (1 vote)
481 views

Unit 2 - Summarizing Data - Charts and Tables

The document provides an overview of summarizing and presenting data through tables and charts. It discusses frequency distributions and how to construct frequency tables. It also explains different types of charts including bar charts, histograms, cumulative frequency graphs, frequency charts, and stem and leaf displays. Examples and interpretations are provided for each. The key learning objectives are to understand how to summarize and present data visually and numerically.

Uploaded by

jemima
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
481 views

Unit 2 - Summarizing Data - Charts and Tables

The document provides an overview of summarizing and presenting data through tables and charts. It discusses frequency distributions and how to construct frequency tables. It also explains different types of charts including bar charts, histograms, cumulative frequency graphs, frequency charts, and stem and leaf displays. Examples and interpretations are provided for each. The key learning objectives are to understand how to summarize and present data visually and numerically.

Uploaded by

jemima
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 33

ECON 1005

Introductory Statistics
Unit 2
Summarizing Tables and
Charts
Learning Objectives
1. Compare the use of bar charts and histograms in presenting data.
2. Explain the associated concepts of class, class limits, class marks and
class width.
3. Discuss the use of frequency charts including the ogive; Create and
interpret histograms, frequency charts, stem and leaf displays, box
plots and scatter-graphs using MINITAB.
Introduction
• Raw data is not very informative.

• Data must be turned into information through


tables or charts (and statistics in next chapter)
Summarizing Data: Summary
Tables
The basic forms of a summary of a dataset are the Summary
Table and the Frequency Table.

• They provide users of the data with a summary.

• They bring together a ‘mass’ of connected information for


digestion ‘at a glance’.

• They help both the researcher and the user to draw


preliminary conclusions from the data.

• They can become cluttered very quickly and hence,


confusing; great care must be taken in their construction
Summarizing Data: Frequency Distributions
• The frequency of a particular observation is the number of times the
observation occurs in the data. The distribution of the variables is the
pattern of frequencies of the observation.

• Frequency distributions show either the actual number of observations


falling in each range or the percentage of observations. In the latter
instance, the distribution is called a “relative frequency distribution.”

• Frequency distribution tables can be used for both categorical and numeric
variables. Continuous variables should only be used with class intervals.
Summarizing Data: Frequency Distributions
Constructing a group frequency table:

• Divide the range of values into a finite number of equal sub-


intervals otherwise called classes.

• Classes must be defined so that no observation from the survey


data could fall into more than one class.

• Too many classes will give the table a cluttered appearance.


Suggest that you use between 6 and 15 classes.
Summarizing Data: Frequency Distributions
• Class Limits- There are two for each class. The lower class limit
of a class is the smallest data value that can go into the class.
The upper class limit of a class is the largest data value that can
go into the class.

• Class Marks- They are the midpoints of the classes. They are
obtained by averaging the limits.

• Class Width – difference between two successive class marks

• Class Frequency – number of observations that fall into the


class
Summarizing Data: Frequency Distributions (Examples)

Lower Limit

Upper Limit
Summarizing Data: Frequency Distributions (Examples)
Relative Frequency Table

Cumulative Frequency
Presenting Charts: Bar Chart
• Used to summarize categorical data that are usually measured on a nominal
or a ordinal scale.

• Displays data for categories in accordance with rectangles, the length of


which is proportional to respective categories’ shares of the total population
or sample.

• Bar Charts can be displayed horizontally and vertically in two dimensional or


three dimensional charts.

• There is usually a space between two consecutive bars in a simple bar chart
which reflects a frequency distribution in some instances. Such a space
does not exist in the context of a histogram.
Presenting Charts: Bar Chart
• Some bar charts reflect other descriptive statistics such as counts (absolute
frequencies), proportions that are indicative of incidence, prevalence,
exposure, etc, and means.

• Bar Charts can be simple or compound, the latter pertaining to bivariate


relationships that are to be examined.

• Since compound bar charts permit exploration of two way relationships,


three-way relationships can be examined by generating compound bar
charts for each category of the third control variable.
Presenting Charts: Bar Chart
• Advantages
1. Summarize a large data set in visual form.
2. Clarify trends better than do tables.
3. Permit a visual check of the accuracy and reasonableness of
calculations.
4. Can easily compare two or three sets of data

• Disadvantages
1. Can be easily manipulated to yield false impressions.
2. Fails to reveal key assumptions, causes, effects, or patterns.
Presenting Charts: Bar Chart (Example)
Presenting Charts: Bar Chart (Interpretation)
• The bar chart shows that women more than men are likely to
give greeting cards during valentines day.

• There is a greater probability of receiving flowers from men for


valentines day.

• Women are least likely to give jewelry as gifts for valentines


day.
Presenting Charts: Histogram
• A histogram is a bar chart for grouped numerical data in which the frequencies or
percentages of each group of numerical data are represented as individual vertical bars.

• Variable of interest categories are shown on the horizontal axis (X axis).

• Midpoints are often used to label categories. So if the category is 1 to 3, 2 might be


shown as the horizontal category label.

• Frequencies are shown on the vertical access (Y axis)

• There are no spaces between categories as there are in bar charts for category data.

• A histogram can assume any one of a large number of shapes. The most common of
these shapes are:
1. Symmetric
2. Skewed
3. Uniform or rectangular
Presenting Charts: Histogram
Advantages
1. Histogram provide a way to display the frequency of occurrences of data
along an interval .
2. Histograms are useful and easy, they apply to both continuous and
discrete.
3. Histograms are generally used when dealing with large data sets.
4. They can be used to detect any unusual observation (outlier) or any gaps
in the dataset.
Disadvantages
5. Cannot read exact values since data is grouped into categories.
6. The use of intervals prevents the calculation of an exact measure of
central tendency.
7. It is more difficult to compare two data sets. difficult to present several
at a time for comparison.
Presenting Charts: Histogram (Example)
Presenting Charts: Histogram (Interpretation)
• The histograms shows the frequency of person attending
religious services . From the histogram one can deduce that
most person (i.e. 125 persons) within the sample attend
religious services several time a year.

• The histogram show that few persons within the sample (i.e.
8 person) attend religious services several times for the
week.

• 12.7 percent of persons attend church once a year


Presenting Charts: Cumulative Frequency Graphs

• A cumulative frequency graph, or ogive is a line graph displaying


the cumulative frequency of each class at its upper class
boundary. The upper boundaries are marked on the horizontal
axis, and the cumulative frequencies are marked on the vertical
axis. The graph should start at (or just before) the lower
boundary of the first class (where the cumulative frequency is
zero), and end at the upper boundary of the last class. The graph
should be increasing from left to right, and the points should be
evenly spaced along the horizontal axis.

• One advantage of an ogive is that it can be used to approximate


the cumulative frequency for any interval.
Presenting Charts: Cumulative Frequency Graphs

Table 1 Ogive graph


Presenting Charts: Cumulative Frequency Graphs

• When plotted on a diagram, the cumulative frequencies gives a curve that is


called an ogive (pronounced o-jive ).

• To draw the ogive graph on the previous slide, the variable, which is total
iPods sold, is marked on the horizontal axis and the cumulative frequencies
on the vertical axis.

• Then the dots are marked above the upper boundaries of various classes at
the heights equal to the corresponding cumulative frequencies.

• The ogive is obtained by joining consecutive points with straight lines.

• Note that the ogive starts at the lower boundary of the first class and ends at
the upper boundary of the last class.
Presenting Charts: Frequency Charts
Table showing the frequency of persons attending religious services (MINITAB output)
Presenting Charts: Frequency Charts
• The frequency chart or table shows the frequency of person attending
religious services . From the table one can deduce that most person (125
persons) within the sample attend religious services several time a year,
that is 19.7% of the sample population.

• The table also shows that few persons within the sample (8 person)
attend religious services several times for the week.

• The second largest quantity of persons i.e. 16.6% attend religious service
about once or twice a year.

• When creating frequency tables measures of central tendency can also


be calculated using MINITAB.
Presenting Charts: Stem & Leaf Displays
• For stem and leaf displays each value is divided into two portions—a stem and a leaf.
The leaves for each stem are shown separately in a display

• It is a way of summarizing a set of data measured on the interval scale. It is often used in
explanatory data analysis to illustrate the major features of the distribution of data in a
convenient and easily drawn form.

Advantages:
1. Easy to construct and make observations
2. It a more informative display for relatively small data sets (<100 data points).
3. Can be used to compare more the one data set.
4. Stem and Leaf plot can be used to determine measures of central tendency such as
the mean, median and mode.

Disadvantages
5. Useful for describing small data sets.
Presenting Charts: Stem & Leaf Displays
• A MINITAB stem plot for this data (created using the "STEM" command) is shown
below. MINITAB first truncates the data by rounding down to integers, then sorts
the data. The resulting dataset is the following:
1, 4, 5, 8, 9, 9, 10, 10, 10, 10, 12, 12, 13, 13, 13, 14, 14, 15, 15, 15, 16, 16, 17, 18,
23.
Presenting Charts: Stem & Leaf Displays
• N represent the sample size i.e. 25 and the leaf unit tells us the value of each leaf i.e. 1.0
.

• The first column of the MINITAB stem and leaf plot shows the frequency or the
cumulative frequency. It shows the number of values from the top down and from the
bottom up to the middle value (the median). The number in brackets i.e. (5) represents
the count of values in the row containing the median, which is the thirteenth ordered
value in this example, 13.0.

• The second column plots the stems, the values for the stems are 0, 1, and 2. The third
column, plots the leaves.

• The stem plot illustrates that the majority of the measurements lie in the teens, with
only 6 of the 25 values less than 10 and only 1 value greater than 20.

• For the first column for instance in the stem and leaf plot shows the value of 1 for the
frequency of one. Also to in the 7th row for instance the values are 12, 12, 13, 13, 13.
Presenting Charts: Box Plots
• A box plot is a way of summarizing a set of data measured on an interval
scale. It is often used in exploratory data analysis.

• It is a type of graph which is used to show the shape of the distribution,


its central value, and variability.

• The box plot usually consists of the most extreme values in the data set
(maximum and minimum values), the lower and upper quartiles and the
median.
Presenting Charts: Box Plots
• This MINITAB box plots represent lottery payoffs for winning numbers for three
time periods (May 1975-March 1976, November 1976-September 1977, and
December 1980-September 1981).
Presenting Charts: Box Plots
• The median for each dataset is indicated by the black center line, and the first and
third quartiles are the edges of the red area, which is known as the inter-quartile
range (IQR).

• The extreme values (within 1.5 times the inter-quartile range from the upper or
lower quartile) are the ends of the lines extending from the IQR.

• Points at a greater distance from the median than 1.5 times the IQR are plotted
individually as asterisks. These points represent potential outliers.

• In this example, the three box plots have nearly identical median values. The IQR is
decreasing from one time period to the next, indicating reduced variability of payoffs
in the second and third periods. In addition, the extreme values are closer to the
median in the later time periods.
Presenting Charts: Scatter Plot
• A scatter plot is a useful summary of a set of bivariate data (two
variables), usually drawn before working out a linear correlation
coefficient or fitting a regression line. It gives a good visual picture of the
relationship between the two variables, and aids the interpretation of the
correlation coefficient or regression model.

• Each unit contributes one point to the scatter plot, on which points are
plotted but not joined. The resulting pattern indicates the type and
strength of the relationship between the two variables.

• A scatter plot is often employed to identify potential associations


between two variables, where one may be considered to be an
explanatory variable (such as years of education) and another may be
considered a response variable (such as annual income).
Presenting Charts: Scatter Plot
• A positive association between education and income would be indicated
on a scatter plot by a upward trend (positive slope), where higher
incomes correspond to higher education levels and lower incomes
correspond to fewer years of education.

• A negative association would be indicated by the opposite effect


(negative slope), where the most highly educated individuals would have
lower incomes than the least educated individuals.

• Or, there might not be any notable association, in which case a scatter
plot would not indicate any trends whatsoever.
Presenting Charts: Scatter Plot
• This MINITAB scatter plot displays the association between the size of a diamond (in carats)
and its retail price (in Singapore dollars) for 48 observations.

Diagram 1 Diagram 2
Presenting Charts: Scatter Plot
• The scatter plot clearly indicates that there is a positive association between size
and price. (See diagram 1)

• A median trace plot clarifies the positive association between size and price. To
create this plot, the horizontal axis (size) is divided into equally spaced segments,
and the median of the corresponding y-values (price) is plotted above the
midpoint of each segment. The points are connected to form the median trace.
(See diagram 2)

You might also like