0% found this document useful (0 votes)
22 views

Session 3 Descriptive Analysis I-Frequency Distribution and Cross Tabulation

Business Analytics

Uploaded by

SRV TECHS
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Session 3 Descriptive Analysis I-Frequency Distribution and Cross Tabulation

Business Analytics

Uploaded by

SRV TECHS
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

Business Analytics: Methods,

Models, and Decisions

Descriptive
Statistics

Slide - 1
Statistics
• Statistics, as defined by David Hand, past president of the
Royal Statistical Society in the UK, is both the science of
uncertainty and the technology of extracting information
from data.
– Statistics involves collecting, organizing, analyzing,
interpreting, and presenting data.
– A statistic is a summary measure of data.
• Descriptive statistics refers to methods of describing and
summarizing data using tabular, visual, and quantitative
techniques.

Copyright © 2021 Pearson Education Ltd. Slide - 2


Metrics and Data Classification
• Metric - a unit of measurement that provides a way to
objectively quantify performance.
• Measurement - the act of obtaining data associated with a
metric.
• Measures - numerical values associated with a metric.

Copyright © 2021 Pearson Education Ltd. Slide - 3


Types of Metrics
• Discrete metric - one that is derived from counting something.
– For example, a delivery is either on time or not; an order is
complete or incomplete; or an invoice can have one, two, three,
or any number of errors. Some discrete metrics would be the
proportion of on-time deliveries; the number of incomplete orders
each day, and the number of errors per invoice.
• Continuous metrics are based on a continuous scale of
measurement.
– Any metrics involving dollars, length, time, volume, or weight, for
example, are continuous.

Copyright © 2021 Pearson Education Ltd. Slide - 4


Measurement Scales
• Categorical (nominal) data - sorted into categories
according to specified characteristics.
• Ordinal data - can be ordered or ranked according to
some relationship to one another.
• Interval data - ordinal but have constant differences
between observations and have arbitrary zero points.
• Ratio data - continuous and have a natural zero.

Copyright © 2021 Pearson Education Ltd. Slide - 5


Example 4.1: Classifying Data
Elements

Copyright © 2021 Pearson Education Ltd. Slide - 6


Frequency Distributions and
Histograms
• A frequency distribution is a table that shows the number
of observations in each of several nonoverlapping groups.
• A graphical depiction of a frequency distribution in the form
of a column chart is called a histogram.

Copyright © 2021 Pearson Education Ltd. Slide - 7


Frequency Distributions for
Categorical Data
• Categorical variables naturally define the groups in a
frequency distribution.
• To construct a frequency distribution, we need only count
the number of observations that appear in each category.
– This can be done using the Excel COUNT IF function.

Copyright © 2021 Pearson Education Ltd. Slide - 8


Example 4.2: Constructing a Frequency Distribution
for Items in the Purchase Orders Database (1 of 2)

• List the item names in a column on the spreadsheet.

• Use the function  COUNTIF($D$4:$D$97,cell_reference),

where cell_reference is the cell containing the item name.

Copyright © 2021 Pearson Education Ltd. Slide - 9


Example 4.2: Constructing a Frequency Distribution
for Items in the Purchase Orders Database (2 of 2)
• Construct a column chart to visualize the frequencies.

Copyright © 2021 Pearson Education Ltd. Slide - 10


Relative Frequency Distributions
• Relative frequency is the fraction, or proportion, of the
total.
• If a data set has n observations, the relative frequency of
category i is:
Frequency of Category i
Relative Frequency of Category i  (4.1)
n

• We often multiply the relative frequencies by 100 to


express them as percentages.
• A relative frequency distribution is a tabular summary of
the relative frequencies of all categories.

Copyright © 2021 Pearson Education Ltd. Slide - 11


Example 4.3: Constructing a Relative Frequency
Distribution for Items in the Purchase Orders Database

• First, sum the frequencies to find the total number


(note that the sum of the frequencies must be the
same as the total number of observations, n).
• Then divide the frequency of each category by this
value.

Copyright © 2021 Pearson Education Ltd. Slide - 12


Frequency Distributions for
Numerical Data
• For numerical data that consist of a small number of
discrete values, we may construct a frequency distribution
similar to the way we did for categorical data; that is, we
simply use COUNT IF to count the frequencies of each
discrete value.

Copyright © 2021 Pearson Education Ltd. Slide - 13


Example 4.4: Frequency and Relative
Frequency Distribution for A / P Terms Start fraction a over p end fraction

• In the Purchase Orders data, the


A/P terms are all whole

numbers 15, 25, 30, and 45.

Copyright © 2021 Pearson Education Ltd. Slide - 14


Excel Histogram Tool
• Frequency distributions and histograms can be created
using the Analysis Toolpak in Excel.
– Click the Data Analysis tools button in the Analysis
group under the Data tab in the Excel menu bar and
select Histogram from the list.

Copyright © 2021 Pearson Education Ltd. Slide - 15


Histogram Dialog
• Specify the Input Range corresponding to the data. If you include the
column header, then also check the Labels box so Excel knows that
the range contains a label. The Bin Range defines the groups (Excel
calls these “bins”) used for the frequency distribution.

Copyright © 2021 Pearson Education Ltd. Slide - 16


Using Bin Ranges
• If you do not specify a bin range, Excel will automatically
determine bin values for the frequency distribution and
histogram, which often results in a rather poor choice.
• If you have discrete values, set up a column of these
values in your spreadsheet for the bin range and specify
this range in the Bin Range field.

Copyright © 2021 Pearson Education Ltd. Slide - 17


Example 4.5: Using the Histogram
Tool (1 of 2)
• We will create a frequency distribution and histogram for
the A/P Terms variable in the Purchase Orders database.
• We defined the bin range below the data in cells

H 99 : H 103 as follows:

Month
15
25
30
45
Copyright © 2021 Pearson Education Ltd. Slide - 18
Example 4.5 Using the Histogram
Tool (2 of 2)
• Histogram tool results:

Copyright © 2021 Pearson Education Ltd. Slide - 19


Grouped Frequency Distributions
• For numerical data that have many different discrete values with little repetition or are
continuous, a frequency distribution requires that we define by specifying

1. the number of groups,

2. the width of each group, and

3. the upper and lower limits of each group.

• Choose between 5 to 15 groups, and the range of each should be equal.

• Choose the lower limit of the first group (L L) as a whole number smaller than the
minimum data value and the upper limit of the last group (U L) as a whole number larger
than the maximum data value.

UL  LL
Group Width= (4.2)
Number of Groups

Copyright © 2021 Pearson Education Ltd. Slide - 20


Example 4.6: Constructing a Frequency Distribution and
Histogram for Cost Per Order(1 of 2)
• The data range from a minimum of $68.75 to a maximum of $127,500;
set the lower limit of the first group to $0 and the upper limit of the last
group to $130,000.
• If we select 5 groups, using equation (3.2) the width of each group is

($130, 000  0)
 $26, 000
5

Copyright © 2021 Pearson Education Ltd. Slide - 21


Example 4.6: Constructing a Frequency Distribution and
Histogram for Cost Per Order (2 of 2)

• Ten-group histogram

Copyright © 2021 Pearson Education Ltd. Slide - 22


Cumulative Relative Frequency
Distributions
• The cumulative relative frequency represents the
proportion of the total number of observations that fall at or
below the upper limit of each group.
• A tabular summary of cumulative relative frequencies is
called a cumulative relative frequency distribution.

Copyright © 2021 Pearson Education Ltd. Slide - 23


Example 4.7: Computing Cumulative
Relative Frequencies
• Set the cumulative relative frequency of the first group equal to
its relative frequency. Then add the relative frequency of the
next group to the cumulative relative frequency.

– For example, the cumulative relative frequency in cell D3 is


computed as =D2+C3 = 0.000 + 0.4468 = 0.4468.

Copyright © 2021 Pearson Education Ltd. Slide - 24


Constructing Frequency Distributions
Using PivotTables
• In the Purchase Orders data, we can simply build a
PivotTable to find a count of the number of orders for
each item.
• For continuous numerical data, we can also use
PivotTables to construct a grouped frequency distribution.

Copyright © 2021 Pearson Education Ltd. Slide - 25


Example 4.8:Constructing a Grouped Frequency
Distribution Using PivotTables (1 of 3)
1. Using the Purchase Orders database, create a PivotTable
as shown:

Copyright © 2021 Pearson Education Ltd. Slide - 26


Example 4.8:Constructing a Grouped Frequency
Distribution Using PivotTables (2 of 3)
2. Click on any value in the Row Labels column, and from
the Analyze tab for PivotTable Tools, select Group Field.
Edit the dialog to start at 0 and end at 130000, and use
26000 as the group range.

Copyright © 2021 Pearson Education Ltd. Slide - 27


Example 4.8:Constructing a Grouped Frequency
Distribution Using PivotTables (3 of 3)
• Grouped frequency distribution results:

Copyright © 2021 Pearson Education Ltd. Slide - 28


Cross-Tabulations
• A cross-tabulation is a tabular method that displays the
number of observations in a data set for different
subcategories of two categorical variables.
– A cross-tabulation table is often called a contingency
table.
• The subcategories of the variables must be mutually
exclusive and exhaustive, meaning that each observation
can be classified into only one subcategory, and, taken
together over all subcategories, they must constitute the
complete data set.

Copyright © 2021 Pearson Education Ltd. Slide - 29


Example 4.13: Constructing a Cross-
Tabulation
• Sales Transactions database

• Count the number (and compute the percentage) of books and DVDs ordered
by region (easy with PivotTables).
Region Book DV D Total Region Book DVD Total
East 56 42 98 East 57.1% 42.9% 100.0%
North 43 42 85
North 50.6% 49.4% 100.0%
South 62 37 99
West 100 90 190 South 62.6% 37.4% 100.0%

Total 261 211 472 West 52.6% 47.4% 100.0%

Copyright © 2021 Pearson Education Ltd. Slide - 30

You might also like