0% found this document useful (0 votes)
14 views25 pages

Unit 3 Descriptive Statistics

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views25 pages

Unit 3 Descriptive Statistics

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

WHAT IS DESCRIPTIVE ANALYTICS?

Descriptive analytics is the process of using


current and historical data to iden

Descriptive analytics is relatively accessible and likely something your


organization uses daily. Basic statistical software, such as Microsoft
Excel or data visualization tools, such as Google Charts and Tableau, can
help parse data, identify trends and relationships between variables, and
visually display information, trends and relationships.
Data analytics can be broken into four key types:
Descriptive, which answers the question, “What happened?”
Diagnostic, which answers the question, “Why did this
happen?”
Predictive, which answers the question, “What might happen
in the future?”
Prescriptive, which answers the question, “What should we do
next?”
What Are the Advantages of
Descriptive Analytics
● Now, let’s look at the stand-out benefits of descriptive analytics.
● It’s easy to do: Descriptive analysis doesn’t require great expertise or
experience in statistical methods or analytics.
● There are a lot of tools available: There is a cornucopia of analytics tools
available to? choose from, products that do most of the heavy lifting. Come to
think of it, that helps explain why it’s easy to perform descriptive analytics!
● It answers the most common business performance questions: Most
stakeholders and salespeople want to know things like "How are we doing?" or
"What should we be doing differently?" Descriptive analytics provides the data
needed to answer those questions efficiently, no matter when or how often
they're asked.
● But, like any other tool, descriptive analysis isn’t perfect. Here are the two
chief drawbacks:
● It’s limited to simple analysis: Descriptive analysis examines the relationship
between a handful of variables, and that’s all.
● It tells you what, but not why: Descriptive analysis reports events as they
happened, not why they happened or what could possibly happen next.
Descriptive vs. Predictive vs. Prescriptive Analytics

Prescriptive
Predictive Analysis
Descriptive Analysis Analysis
What’s What
Summary What happened? going to should
happen? happen?
It takes the
conclusion
s gleaned
It looks at
from
historical
descriptive
data and
and
It uses data mining and data analyzes
predictive
Function aggregation to discover past data
analysis
historical data. trends to
and
predict
recommen
what could
ds the best
happen.
future
course of
action.
It offers
critical
It’s a
It’s easy to employ in insights into
valuable
Pros daily operations. Little making the
forecasting
experience is needed. best, most
tool.
informed
decisions.

It needs lots It requires a


of historical lot of past
It offers a limited view, data to data and
Cons and doesn't go beyond work. It will often cannot
the data’s surface. never be account for
100% all possible
accurate. variables.
Shape – Center - Spread

• When we gather data, we want to uncover the


“information” in it. One easy way to do that is to
think of: “Shape –Center- Spread”
• Shape – What is the shape of the histogram?
• Center – What is the mean or median?
• Spread – What is the range or standard deviation?
Chapter 3 - Key Terms
• Measures of • Mean
Central – µ, population; , sample
Tendency, • Weighted Mean
• Median
The Center
• Mode
(Note comparison of mean,
median, and mode)
Chapter 3 - Key Terms
• Measures of • Range
Dispersion, • Variance
(Note the computational difference
The Spread between σ2 and s2.)

• Standard deviation
• Interquartile range
Chapter 3 - Key Terms
• Measures of • Coefficient of correlation, r
Association – Direction of the relationship:
direct (r > 0) or inverse (r < 0)
– Strength of the relationship:
When r is close to 1 or –1, the linear
relationship between x and y is strong.
When r is close to 0, the linear
relationship between x and y is weak.
When r = 0, there is no linear
relationship between x and y.
• Coefficient of determination, r2
– The percent of total variation in y that is
explained by variation in x.
The Center: Mean
• Mean
– Arithmetic average = (sum all values)/# of values
» Population: µ = (Σxi)/N
» Sample: = (Σxi)/n
x
Be sure you know how to get the value easily from
your calculator and computer softwares.
Problem: Calculate the average number of truck shipments from the
United States to five Canadian cities for the following data given in
thousands of bags:
Montreal, 64.0; Ottawa, 15.0; Toronto, 285.0;
Vancouver, 228.0; Winnipeg, 45.0 (Ans: 127.4)
The Center: Weighted Mean
• When what you have is grouped data, compute
the mean using µ = (Σwixi)/Σwi
Problem: Calculate the average profit from truck shipments, United
States to Canada, for the following data given in thousands of bags
and profits per thousand bags:
Montreal 64.0 Ottawa 15.0 Toronto 285.0
$15.00 $13.50 $15.50
Vancouver 228.0 Winnipeg 45.0
$12.00 $14.00

(Ans: $14.04 per thous. bags)


The Center: Median
• To find the median:
1. Put the data in an array.
2A. If the data set has an ODD number of numbers, the median is the
middle value.
2B. If the data set has an EVEN number of numbers, the median is
the AVERAGE of the middle two values.
(Note that the median of an even set of data values is not
necessarily a member of the set of values.)

• The median is particularly useful if there are outliers in


the data set, which otherwise tend to sway the value of an
arithmetic mean.
The Center: Mode

• The mode is the most frequent value.


• While there is just one value for the mean
and one value for the median, there may be
more than one value for the mode of a data
set.
• The mode tends to be less frequently used
than the mean or the median.
Shape: The “shape” of the data is
called its “distribution”?
• If mean = median = mode, the shape of the distribution is
symmetric.
• If mode < median < mean, the shape of the distribution
trails to the right, is positively skewed.
• If mean < median < mode, the shape of the distribution
trails to the left, is negatively skewed.
• Distributions of various “shapes” have different
properties and names such as the “normal” distribution,
which is also known as the “bell curve” (among
mathematicians it is called the Gaussian Distribution).
Normal Distribution
So, if: Therefore,
Average = 3500
Raw score = 4500 SD = 2000
Z = +0.5
Platykurtic

68.26%
Non-Normal Distribution
Mode
Negative Skew
Median
Mean
Non-Normal Distribution
Mode
Positive Skew

Median
Mean
The Spread: Range
• The range is the distance between the smallest
and the largest data value in the set.
• Range = largest value – smallest value
• Sometimes range is reported as an interval,
anchored between the smallest and largest data
value, rather than the actual width of that
interval.
The Spread: Variance
• Variance is one of the most frequently used
measures of spread,
– for population,

– for sample,

• The right side of each equation is often used as a


computational shortcut.
The Spread: Standard Deviation
• Since variance is given in squared units, we
often find uses for the standard deviation,
which is the square root of variance:
– for a population,

– for a sample,
Be sure you know how to get the values easily from
your calculator and computer softwares.
Relative Position - Quartiles
• One of the most frequently used quantiles is the quartile.
• Quartiles divide the values of a data set into four subsets
of equal size, each comprising 25% of the observations.
• To find the first, second, and third quartiles:
– 1. Arrange the N data values into an array.
– 2. First quartile, Q1 = data value at position (N + 1)/4
– 3. Second quartile, Q2 = data value at position 2(N + 1)/4
– 4. Third quartile, Q3 = data value at position 3(N + 1)/4

You might also like