0% found this document useful (0 votes)
5 views

MGS2150_Lecture2_Notes_II

The document covers descriptive statistics focusing on summarizing data for two variables using tables and graphical displays. It explains crosstabulation as a method for summarizing relationships between two variables, provides examples, and discusses data visualization best practices. Additionally, it highlights the importance of graphical displays in recognizing patterns and trends, and offers guidance on creating effective visual representations of data.

Uploaded by

wangyus
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

MGS2150_Lecture2_Notes_II

The document covers descriptive statistics focusing on summarizing data for two variables using tables and graphical displays. It explains crosstabulation as a method for summarizing relationships between two variables, provides examples, and discusses data visualization best practices. Additionally, it highlights the importance of graphical displays in recognizing patterns and trends, and offers guidance on creating effective visual representations of data.

Uploaded by

wangyus
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

MGS2150

Business Statistics and Application

Lecture 2 Descriptive Statistics:


Tabular and Graphical Displays II

1
Introduction

• Summarizing Data for Two Variables Using Tables

• Summarizing Data for Two Variables Using Graphical Displays

• Data Visualization: Best Practices in Creating Effective


Graphical Displays

2
Summarizing Data for Two Variables Using Tables

• We have focused on methods that are used to summarize the


data for one variable at a time.

• Often a manager is interested in tabular and graphical


methods that will help understand the relationship-if any-
between two variables.

• Crosstabulation is a method for summarizing the data for two


variables.

3
Crosstabulation

• A crosstabulation is a tabular summary of data for two


variables.

• Crosstabulation can be used when:


– one variable is categorical and the other is quantitative,
– both variables are categorical, or
– both variables are quantitative.
– the labels shown in the margins of the table define the
categories (classes) for the two variables.

4
Crosstabulation - Example

• Consider Zagat’s review of 300 restaurants in the Los Angeles area.


The data set includes measurements on quality rating and typical
meal price .
• The Quality Rating, a categorical variable with categories good, very
good, and excellent.
• The Meal Price is a quantitative variable ranging from $10 to $49.

Restaurant Quality Rating Meal Price ($)


1 Good 18
2 Very Good 22
3 Good 28
4 Excellent 38
. . .
. . .
. . .
5
Crosstabulation – Example (Cont’d)

• In the left margin, the row labels correspond to the three rating categories
for the quality rating variable.
• In the top margin, the column show that the meal price data have been
grouped into four classes.
• Each restaurant is associated with a cell appearing in one of the rows and
one of the columns of the crosstabulation.
• Count the number of restaurants that belong to each of the cells.

Meal Price
Quality Rating $10-19 $20-29 $30-39 $40-49 Total
Good 42 40 2 0 84
Very Good 34 64 46 6 150
Excellent 2 14 28 22 66
TotaI 78 118 76 28 300

6
Crosstabulation of Quality Rating and Meal Price Data for
300 Los Angeles Restaurants

• Restaurants with higher meal prices received higher quality ratings


than restaurants with lower meal prices.
– Greatest number of restaurants in the sample (64) have a very good rating and
the meal price in the $20-29 range.
– Only 2 restaurants have an excellent rating and a meal price in the range of
$10-19 range.

• The right and bottom margins of the crosstabulation provide the


frequency distributions for quality rating and meal price separately.
– From the right margin, data on quality ratings show 84 restaurants with a good
quality rating, 150 restaurants with a very good quality rating, and 66
restaurants with an excellent quality rating.
– The bottom margin shows the frequency distribution for the meal price
variable.

7
Construct a Crosstabulation with PivotTable

• Enter/Access Data: Open the file restaurant.


• Apply Tools:
Step 1. Select any cell in the data set and click the Insert tab.
Step 2. In the Tables group click PivotTable. When the Create PivotTable
dialog box appears, click OK.
Step 3. In the PivotTable Fields task pane, drag Quality Rating to the
Rows area, drag Meal Price to the Columns area, and drag Restaurant to
the Values area.
Step 4. Click on Sum of Restaurant in the Values area.
Step 5. Select Value Field Settings… from the list of options that appear
Step 6. When the Value Field Settings dialog box appears:
Under Summarize value field by, choose Count
Click OK

8
Initial PivotTable Fields Task Pane and PivotTable for the
Restaurant Data

9
Final PivotTable for the Restaurant Data

• Editing Options:
Step 1. Right-click cell B4 in the PivotTable or any other cell
containing meal prices.
Step 2. Choose Group from the list of options that appears
Step 3. When the Grouping dialog box appears:
Enter 10 in the Starting at: box; Enter 49 in the Ending at: box
Enter 10 in the By: box; Click OK
Step 4. Right-click on Excellent in cell A5
Select Move and click Move “Excellent” to End

10
Final PivotTable for the Restaurant Data (Cont’d)

11
Crosstabulation: Row Percentages

• Converting the entries in a crosstabulation into row percentages or


column percentages can provide more insight into the relationship
between the two variables.

• For row percentages, divide each frequency by its corresponding


row total.

• Example - Zagat’s review: Each row in the table is a percent


frequency distribution of meal price for one of the quality rating
categories.

12
Row Percentages for Each Quality Rating Category
Meal Price
Quality Rating $10-19 $20-29 $30-39 $40-49 Total
Good 50.0% 47.6% 2.4% 0.0% 100.0%
Very Good 22.7% 42.7% 30.6% 4.0% 100.0%
Excellent 3.0% 21.2% 42.4% 33.4% 100.0%

The number of restaurants with the lowest quality rating (good)


with a meal price of $10-19/Total number of good restaurants:
(42/84) * 100 = 50%

• Of the restaurants with the lowest quality rating, 50% have lowest meal
prices.
• Of the restaurants with an excellent quality rating, greatest percentages
are for the more expensive restaurants .
• Restaurants with higher meal prices received higher quality ratings.

13
Crosstabulation: Column Percentages

• For column percentages, divide each frequency in the crosstabulation by


its corresponding column total.
• Example - Zagat’s review: Each column in the table is a percent frequency
distribution of quality rating for one of the classes of meal prices.
– Only 2.6% of restaurants with low ($10-19) meal prices have an excellent
quality rating.
– As meal price increases, percent frequency distribution shifts toward higher
quality ratings

Meal Price
Quality Rating $10-19 $20-29 $30-39 $40-49
Good 53.8% 33.9% 2.6% 0.0%
Very Good 43.6% 54.2% 60.5% 21.4%
Excellent 2.6% 11.9% 36.8% 78.6%
TotaI 100.0% 100.0% 100.0% 100.0%

(2/78) * 100 = 2.6% 14


Crosstabulation: Simpson’s Paradox

• Data in two or more crosstabulations are often aggregated to produce a


summary crosstabulation.

• Conclusions drawn from two or more separate crosstabulations can be


reversed when the data are aggregated into a single crosstabulation. The
reversal of conclusions based on aggregate and disaggregated data is
called Simpson’s paradox.

• Be careful in drawing conclusions about the relationship between the two


variables in the aggregated crosstabulation.
– whether a hidden variable could affect the results such that separate or
disaggregated crosstabulations provide a different and possibly better insight
and conclusion.

15
Simpson’s Paradox: Example

• Western University has only one women’s softball scholarship remaining for the
coming year. The final two players that Western is considering are Allison and Emily.
The coaching staff has concluded that the speed and defensive skills are virtually
identical for the two players, and that the final decision will be based on which
player has the best batting average. Crosstabulations of each player’s batting
performance in their junior and senior years of high school are as follows:
Allison Emily
Outcome Junior Senior Outcome Junior Senior
Hit 15 (38%) 75 (30%) Hit 70 (35%) 35 (29%)
No Hit 25 (63%) 175 (70%) No Hit 130 (65%) 85 (71%)
Total At-Bats 40 (100%) 250 (100%) Total At-Bats 200 (100%) 120 (100%)

Allison had the higher batting average in both her junior year and
senior year.

16
Simpson’s Paradox: Example (Cont’d)

• When the results are aggregated for each player, a different picture emerges:
– Crosstabulations of each player’s batting performance in the combined two-year( junior
and senior years) of high school are as follows:

Combined 2-Year Batting


Outcome Allison Emily
Hit 90 (31%) 105 (33%)
No Hit 200 (69%) 215 (67%)
Total At-Bats 290 (100%) 320 (100%)

• Based on aggregated crosstabulation, Emily has the higher batting average over the
combined two years.
• This result contradicts the conclusion we reached with the unaggregated crosstabulation.

• The decision maker will have to decide whether the unaggregated or the aggregated form
of the crosstabulation is most helpful in identifying the desired conclusion

17
Summarizing Data for Two Variables Using Graphical
Displays

• A graphical display is more useful for recognizing patterns and


trends in the data.

• Displaying data in creative ways can lead to powerful insight


and allow us to make “common-sense inferences” based on
our ability to visually compare, contrast, and recognize
patterns .

• Scatter diagrams and trendlines

• Side-by-side and stacked bar charts

18
Scatter Diagram and Trendline

• A scatter diagram is a graphical presentation of the


relationship between two quantitative variables.
– One variable is shown on the horizontal axis and the other
variable is shown on the vertical axis.
– The general pattern of the plotted points suggests the overall
relationship between the variables.

• A trendline is a line that provides an approximation of the


relationship.

19
Types of Relationships Depicted by Scatter Diagram

• Some general scatter diagram patterns and the types of


relationship

20
Scatter Diagram and Trendline: Example

• Example: The managers of an electronics store want to


investigate whether a relationship exists between advertising
and sales

Number of Sales
Week Commercials Volume
1 2 50
2 5 57
3 1 41
. . .
. . .

21
Scatter Diagram and Trendline: Example (Cont’d)

• The horizontal axis (x) shows the number of commercials; the


vertical axis (y) represents sales.

• The scatter diagram indicates a positive relationship between


the number of commercials and sales.
– Higher sales is associated with greater number of commercials.

• The relationship does not form a perfectly straight line.


However, the general pattern of the points and the trendline
suggest that the overall relationship is positive.

22
Excel: Construct a Scatter Diagram

• Enter/Access Data: Open the file electronics.


• Apply Tools:
Step 1. Select cells B1:C11.
Step 2. Click the Insert tab.
Step 3. In the Charts group, click Insert Scatter (X,Y) or Bubble Chart.
Step 4. When the list of scatter diagram subtypes appears, click
Scatter.
• Editing
Step 1. Click the Chart Title and replace it with Scatter Chart for the
San Francisco Electronics Store
Step 2. Click the Chart Elements button (located next to the top-right
corner of the chart)

23
Excel: Construct a Scatter Diagram (Cont’d)

Step 3. When the list of chart elements appears:


Select the check box for Axis Titles
Deselect the check box for Gridlines
Select the check box for Trendline
Step 4. Click the horizontal Axis Title placeholder and replace it with Number
of Commercials
Step 5. Click the vertical Axis Title placeholder and replace it with Sales
($100s)

24
Side-by-Side Bar Charts

• A side-by-side bar chart is a graphical display for depicting


multiple bar charts on the same display.

• Example - Zagat’s restaurant reviews:


– Each cluster of bars represents one of the four classes of meal prices.
– Each bar within a cluster represents one of the three quality rating
categories. The color of each bar indicates the quality rating.
– The bar is extended to the point on the vertical axis that represents
the frequency with which that quality rating occurred for each of the
meal price categories.

25
Side-by-Side Bar Charts – Example

• As the price increases (left to right), the height of the light blue bars
decreases and the height of the dark blue bars generally increases.
→ As price increases, the quality rating tends to be better.
• The very good rating tends to be more prominent in the middle price
categories.
26
Excel: Construct a Side-by-Side Bar Chart

• Apply Tools: Step 1. Select any cell in the PivotTable report


Step 2. Click the Insert tab
Step 3. In the Charts group, click Recommended Charts
Step 4. Click OK
Step 5. Click the Design tab
Step 6. In the Data group, click Switch Row/Column to display Meal Price
($) on the horizontal axis

27
Stacked Bar Chart

• A stacked bar chart is a bar chart in which each bar is broken into
rectangular segments of a different color showing the relative
frequency of each class.
• Because percentage frequencies are displayed, all bars are of the
same height, extending to the 100% mark.
• Example - Zagat’s restaurant reviews: As price increases, the quality
rating tends to be better.

28
Excel: Construct a Stacked Bar Chart

• You can easily change the side-by-side bar chart to a stacked bar
chart using the following steps.
Step 1. Click on the bar chart. Click the Design tab on the Ribbon.
In the Type group, click Change Chart Type.
Step 2. When the Change Chart Type dialog box appears:
Select the 100% Stacked Columns option. Click OK.

29
Data visualization

• Data visualization is the use of graphical displays to


summarize and present data.

• The goal is to communicate, as effectively and clearly as


possible, the key information about the data.

• Data-ink ratio measures the proportion of ink used in a table


or chart that is necessary to convey the meaning to the
audience (known as “data-ink”) to the total ink used for the
table or chart.

30
Creating Effective Graphical Displays

• Give the display a clear and concise title.

• Keep the display simple by maximizing the data-ink ratio.

• Clearly label each axis and provide the units of measure.

• If colors are used, make sure they are distinct.

• If multiple colors or lines are used, provide a legend.

31
Choosing the Type of Graphical Display

• Bar Chart: used to show the frequency distribution and relative


frequency distribution for categorical data
• Pie Chart: used to show the relative frequency and percent frequency
for categorical data
• Dot Plot: used to show the distribution of quantitative data over the
entire range of the data
• Histogram: used to show the frequency distribution for quantitative
data over a set of class intervals
• Stem-and-Leaf Display: used to show both the rank order and shape of
the distribution for quantitative data

32
Summary of Graphical Displays Used to Make
Comparisons and Show Relationships

To Make Comparisons To Show Relationships


• Side-by-Side Bar Chart: used • Scatter diagram: used to
to compare two variables show the relationship
• Stacked Bar Charts: used to between two quantitative
compare the relative variables
frequency or percent • Trendline: used to
frequency of two approximate the
categorical variables relationship of data in a
scatter diagram

33
Data Dashboards

• A data dashboard is a widely used data visualization tool .


• It organizes and presents key performance indicators (KPIs)
used to monitor an organization or process.
• It provides timely summary information that is easy to read,
understand, and interpret.

• Additional guidelines for data dashboards include:


– Minimize the need for screen scrolling.
– Avoid unnecessary use of color or 3D displays.
– Use borders between charts to improve readability.

34
Data Dashboard: Example

The data dashboard was


developed to monitor
the performance of the
call center. It combines
several displays to
monitor the call center’s
KPIs.

35
A Summary of Tabular and Graphical Displays of Data

36

You might also like