Chapter 4 - Descriptive Statistical Measures
Chapter 4 - Descriptive Statistical Measures
decision or investigation
- all married drivers over 25 years old
- all subscribers to Netflix
Sample - a subset of the population
- a list of individuals who rented a comedy from
Netflix in the past year
The purpose of sampling is to obtain sufficient
information to draw a valid inference about a
population.
We typically label the elements of a data set using
subscripted variables, x1, x2 , … , and so on, where xi
represents the ith observation.
It is common practice in statistics to use Greek letters,
such as m (mu), s (sigma), and p (pi), to represent
population measures and italic letters such as by x
(called x-bar), s, and p to represent sample statistics.
N represents the number of items in a population and n
represents the number of observations in a sample.
S represents summation: Sxi = x1 + x2 + … xn
Population mean:
Sample mean:
=SUM(B2:B95)/COUNT(B2:B95)
Mean = $2,471,760/94
= $26,295.32
Median =
($15,562.50 + $15,750.00)/2
= $15,656.25
=MEDIAN(B2:B94)
The mode is the observation that occurs most
frequently.
The mode is most useful for data sets that contain
a relatively small number of unique values.
You can easily identify the mode from a frequency
distribution by identifying the value or group
having the largest frequency or from a histogram
by identifying the highest bar.
Excel function: =MODE.SNGL(data range).
For multiple modes: =MODE.MULT(data range)
Purchase Orders
database: A/P Terms
Mode = 30 months
=(B2 - $B$97)/$B$98, or
=STANDARDIZE(B2,$B$97,$B$98).
The coefficient of variation (CV) provides a relative
measure of dispersion in data relative to the mean:
Data >
Data Analysis >
Descriptive Statistics
Enter Input Range
Labels (optional)
Check Summary Statistics box
Note: Results of
the Analysis
Toolpak do not
change when
changes are
made to the data.
Population mean:
Sample mean:
Population variance:
Sample variance:
Computer Repair Times
If the data are grouped into k cells in a frequency
distribution, we can use modified versions of the
formulas to estimate the mean and variance by
replacing xi with a representative value (such as
the midpoint) for all the observations in each cell.
Representative
group value
The proportion, denoted by p, is the fraction of
data that have a certain characteristic.
Proportions are key descriptive statistics for
categorical data, such as defects or errors in
quality control applications or consumer
preferences in market research.
Proportion of orders placed by Spacetime Technologies
=COUNTIF(A4:A97, “Spacetime Technologies”)/94
= 12/94 = 0.128
Value Field Settings include several statistical
measures:
Average
Max and Min
Product
Standard deviation
Variance
Credit Risk Data
First, create a PivotTable.
In the PivotTable Field List, move Job to the Row Labels
field and Checking and Savings to the Values field. Then
change the field settings from “Sum of Checking” and
“Sum of Savings” to the averages.
Two variables have a strong statistical relationship
with one another if they appear to move together.
When two variables appear to be related, you
might suspect a cause-and-effect relationship.
Sometimes, however, statistical relationships exist
even though a change in one variable is not
caused by a change in the other.
Covariance is a measure of the linear association between two
variables, X and Y. Like the variance, different formulas are used for
populations and samples.
Population covariance:
CORREL(array1,array2) =
COVARIANCE.P(array1,array2) / STDEV.P(array1)*STDEV.P(array2)
and
CORREL(array1,array2) =
COVARIANCE.S(array1,array2) / STDEV.S(array1)*STDEV.S(array2)
Data >
Data Analysis >
Correlation