0% found this document useful (0 votes)
30 views

Business Analytics Chapter02

The document discusses descriptive statistics and provides examples of key concepts in data analysis including raw data, proper data sets, variables, populations and samples, categorical and quantitative data, and cross-sectional and time series data. It explains how data should be organized and structured to allow for effective analysis using features in Excel like sorting, filtering, and pivot tables. Various statistical terms are also defined to lay the groundwork for descriptive analysis techniques.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

Business Analytics Chapter02

The document discusses descriptive statistics and provides examples of key concepts in data analysis including raw data, proper data sets, variables, populations and samples, categorical and quantitative data, and cross-sectional and time series data. It explains how data should be organized and structured to allow for effective analysis using features in Excel like sorting, filtering, and pivot tables. Various statistical terms are also defined to lay the groundwork for descriptive analysis techniques.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 22

Basic Business Analytics

using Excel
Chapter 02
Descriptive Statistics

1
Raw Data: Data stored in its smallest size
No: Yes:

Addresses Address City State Zip


313 173rd Blvd, Kent, WA 981215 313 173rd Blvd Kent WA 981215
316 66th Blvd, Kent, WA 981244 316 66th Blvd Kent WA 981244
4358 23rd St, Kent, WA 981225 4358 23rd St Kent WA 981225
965 151st St, Kent, WA 981162 965 151st St Kent WA 981162
7900 173rd Lane, Kent, WA 981266 7900 173rd Lane Kent WA 981266
4047 15th Ave, Kent, WA 981228 4047 15th Ave Kent WA 981228
4907 13th Ave, Kent, WA 981232 4907 13th Ave Kent WA 981232
3789 4th Blvd, Seattle, WA 981152 3789 4th Blvd Seattle WA 981152
2977 66th Lane, Seattle, WA 981171 2977 66th Lane Seattle WA 981171
3392 23rd St, Seattle, WA 981131 3392 23rd St Seattle WA 981131

Why?
Because it is easier to analyze data when it is stored in its smallest parts 2
Data:
• Textbook: Facts or figures collected, analyzed and summarized
for presentation and interpretation
• Data = all the unorganized raw data in a Proper Data Set

Transaction
Number Date Sales SalesRep
12568 12/1/2014 $19,161 Jo
12569 12/1/2014 $15,027 Gigi
12570 12/2/2014 $12,953 Chin
12571 12/2/2014 $12,670 Jo
12572 12/2/2014 $8,893 Gigi
12573 12/3/2014 $4,667 Chin
12574 12/3/2014 $20,272 Jo
12575 12/3/2014 $20,204 Gigi
12576 12/3/2014 $17,223 Chin
3
Data Types & Default Alignment in Excel

• Empty Cells  Not really a Data Type, but it is a "thing" in Excel that can sometimes cause
problems.
• **Refer to Empty Cells as "Empty Cells", not blanks.
• Why Default Alignment? Because Left means Excel thinks it is Text and Right means Excel thinks it is
a Number. This is important when dealing with data because some systems will mistakenly import
numbers as text. Numbers as text do not always behave like you expect (like not being added by
4
the SUM function. The Default Alignment is a visual cue that informs us about how Excel “sees” the
data.
Proper Data Set: Proper Table of Data
• A structure for your data set
necessary so that Excel Data
Transaction
Analysis features like Sort, Filter Number Date Sales SalesRep
and PivotTables will work 12568 12/1/2014 $19,161 Jo
correctly: 12569 12/1/2014 $15,027 Gigi
12570 12/2/2014 $12,953 Chin
1. Fields in first row (no empty 12571 12/2/2014 $12,670 Jo
cells) 12572 12/2/2014 $8,893 Gigi
12573 12/3/2014 $4,667 Chin
2. Records or Observations in rows 12574 12/3/2014 $20,272 Jo
3. Empty cells or Excel 12575 12/3/2014 $20,204 Gigi
12576 12/3/2014 $17,223 Chin
Row/Column Headers all the
way around Data Set
4. Try not to have empty cells in
data set 5
Terms for Proper Data Set
Primary Key / Variables
List of Unique Elements

Element = Entities on
which data are collected.
We are collecting data for
each Transaction Number.
Transaction Number is the
Element.

Each row is
a Record /
Observation 6

All 4 are called Fields (Column Headers)


Variable, Element, Observation
• Variable
• A characteristic or quantity of interest that can take on different values
• A Variable is also known as a “Field” or “Column Header” in Database terminology
• Example: Street address, City, State, Zip for a customer
• Element
• Entities on which data are collected
• Like collecting data for an Employee or Invoice Number
• Primary Key
• When the first column in a Proper Data Set contains a “Unique List” of Elements, it is called a
“Primary Key”.
• “Primary Key”, “Unique List of Elements”, “List of Unique Identifiers”, “Distinct List” are all synonyms
• The “Primary Key” assure that data collected for a give element is stored in one and only one
place.
7
• Observation or Record
• A set of values corresponding to a set of Variables (Fields) for a set of Elements
Proper Data Set with a Primary Key / List
of Unique Elements:
Proper Data Set:

8
Proper Data Set with NO Primary Key /
List of Unique Elements:
Proper Data Set: Using the PivotTable feature we can create a
Proper Data Set with a Primary Key (Unique
List of Products or Elements):

9
Variables
• Variable (from previous slide)
• A characteristic or quantity of interest that can take on different values
• Decision Variables
• Variables under the direct control of decision makers
• Example
• The “Quantity” Variable for a manufacturer. Managers can decide how many to make
each day.
• Random (uncertain variables) Variables:
• In general, variables that are outside of the decision makers control
• A quantity whose value is not known with certainty
• Example:
• Stock Price of Yahoo 10
• Number of units sold of a particular product
Variables and Variation If you own Yahoo Stock, you would be
interested in the Variation in the Variable
• Variation “Price (Adj Close)”.

• The difference in a variable measured over


observations
• Differences over time
• Differences between customers or products
• **We will have a numerical measure for
variation later…
• Roll of Descriptive Statistics:
• Collect “Past Observed Values for Variables”
or “Realizations of Variables” or “Raw Data”
or “Data”
• Analyze Data to gain a better understanding
of the variation and its impact on the
11
business setting/situation
Population and Sample
• Population
• All elements of interest
• Sample
• Subset of the population
• Random sampling
• A sampling method to gather a representative sample of the
population data.
• Each element comes from the same population (Target Population)
• Each element is selected independently (without bias)
12
Categorical and Quantitative Data
• Quantitative Data
• “Number Data” on which numeric and arithmetic operations, such as
addition, subtraction, multiplication, and division, can be performed.
• Discrete Quantitative Data: There are gaps between numbers, like
counting: 1, 2, 3…
• Continuous Quantitative Data: There are no gaps between numbers,
like weight, time, money. The number depends on the measurement
instrument.
• Categorical Data
• “Not Number Data”, like Product Names or “Yes” “No” Data on which arithmetic 13
operations cannot be performed.
Data Terminology
Cross-sectional Data Time Series Data
• Cross-sectional Data • Data collected over several time periods
• Data collected from several (Year, Month, Day, Hour…).
elements/entities at the same, or • Charts of time series data are common
approximately the same, point in time. in business and economics.
• Help analysts understand what
Sep 22, 2015 happened in the past, identify trends
Market Cap:
GOOG
426.88B
YHOO
28.62B
FB Industry
261.91B 277.63M
over time, and project future levels for
Employees: 57148 12500 10955 355 the time series.
Qtrly Rev Growth (yoy): 0.11 0.15 0.39 0.15
Revenue (ttm): 69.61B 4.87B 14.64B 132.20M
Gross Margin (ttm): 0.62 0.67 0.83 0.58
EBITDA (ttm): 22.62B 541.75M 6.38B 3.47M
Operating Margin (ttm): 0.26 0.02 0.32 0.01
Net Income (ttm): 14.39B 6.94B 2.72B N/A
EPS (ttm): 21.22 7.2 0.98 0
P/E (ttm): 29.34 4.22 94.47 33.33
PEG (5 yr expected): 1.22 -2.38 1.59 1.07
14
P/S (ttm): 6.26 6.02 18.39 3.74
Sources of Data
• Experimental study
• A variable of interest is first identified.
• Then one or more other variables are identified and controlled or manipulated so that data can
be obtained about how they influence the variable of interest.
• Nonexperimental study or observational study - Make no attempt to control the variables
of interest.
• A survey is perhaps the most common type of observational study.
• Existing Data Sets:
• Customer Lists
• Sales or Expense Lists
• Census Data
• Weather Data
• Government sources (data.gov) 15
• Purchase data from companies such as: Bloomberg, Dow Jones
Sort & Filter to Organize Data
Sort Filter
• Organize the Raw Data by sorting • Must have a Proper Data Set
• Example: Sort Sales biggest to • Filter Button in Data Ribbon
smallest • Great for querying a data set
• Sort Buttons in Data Ribbon (Extracting Observations / Records
• Sort columns one by one, with the from a Proper Data Set) to get a
“Major Sort” last. sub-set of data based on a set of
• Sort Dialog Box conditions or criteria
• Make sure that “Major Sort” on
top.
• Keyboard for Sort: Alt, D, S 16
Conditional Formatting to Visualizing Data
• Each cell in the highlighted range must get a logical test
that comes out TRUE (apply formatting) or FALSE (do NOT
apply formatting)
• Logical test can be created with built-in features or Logical
Formulas
• Great for visualizing data based on a set of conditions or
criteria
17
Frequency Distributions and
Column/Bar Charts for Categorical Data
• Frequency Distribution for Categorical Data is a tabular summary which:
1. Shows the number of observations (count or frequency) in each of a set
categories (unique list from data set)
2. Categories must be Collectively Exhaustive Categories (enough categories so
nothing is left out) and Mutually Exclusive Categories (no item can fit into more
than one category)
3. Goal is to is to provide information about frequencies (count)
• Relative Frequency Distribution
• Shows decimal value that represents "parts compared to the whole" (used in
chapter 4 for assigning probabilities)
• Percent Frequency Distribution
18
• Formats Relative Frequencies with Percent Number Format
Frequency Distributions and
Column/Bar Charts for Categorical Data
• Column/Bar Chart:
• Used to show Frequency Distribution or Relative/Percent Frequency
Distribution for Categorical Data
• Counts across categories. Height of columns convey count. Order of
categories conveys no info
• There are "gaps" between columns to indicate that the data is
categorical or a discrete quantitative variable (not a continuous
quantitative variable). Columns do not touch
19
Frequency Distributions and
Column/Bar Charts for Categorical Data
PivotTable: COUNTIFS function:

Web Site Frequency % Frequency Web Site Frequency % Frequency


amazon.com 11436 43.12% amazon.com 11436 43.12%
coloradoboomerangs.com 6380 24.05% coloradoboomerangs.com 6380 24.05%
ebay.com 5810 21.90% ebay.com 5810 21.90%
gel-boomerang.com 2898 10.93% gel-boomerang.com 2898 10.93%
Grand Total 26524 100.00% Total 26524 100.00%

Car Chart (Column on its side):

Boomerang Inc. 2015 Sales Frequency by Web Site

gel-boomerang.com 2898

ebay.com 5810

coloradoboomerangs.com 6380

20
amazon.com 11436
Histograms for Quantitative Data
• Histograms
• Used to show frequency distribution of continuous quantitative data
over a set of class intervals (lower and upper limit for each category)
• Column or Bar Charts where columns are touching to indicate that the
variable is continuous
• Columns touch to indicate that no numbers can fit between classes.
"No numbers can fit between columns - no gaps"
• Height of columns convey count
• Order of classes is important to help reveal shape of data, or
distribution of data. 21
Mean, Median, Mode
• Mean
• Arithmetic Mean: Add them up and divide by the count
• Good for quantitative data when there are not extreme values - extreme values can make the mean look too
big or too small (Median more representative of a typical value in that case)
• Use AVERAGE function
• Median
• Sort, then take the one in the middle. If count odd, take one in middle, if even, average middle two.
• Marks the point in the sorted list (an actual number) where 50% of the numbers are above and 50% of the
numbers are below
• Good for quantitative data when there are extreme values (like house prices and salaries)
• Use MEDIAN function
• Mode
• One that occurs most frequently (can be bimodal, multimodal)
• Good for Categorical Data (Nominal and Ordinal)
• Use MODE.SNGL for quantitative data and COUNTIF or PivotTable for Categorical or quantitative data. 22
MODE.SNGL will only show 1 mode if the data set is bi-modal or multi-modal. MODE.MULT can be used for
multiple modes.

You might also like