0% found this document useful (0 votes)
9 views

MATH30-6-Lecture-3

The document outlines the objectives and various measures used in univariate analysis, including measures of central tendency (mean, median, mode), measures of variation (range, variance, standard deviation), and measures of shape (skewness, kurtosis). It provides definitions, characteristics, and examples for each measure, emphasizing their applications in describing and interpreting data. Additionally, it discusses the importance of understanding data dispersion and the implications of different statistical measures in analyzing datasets.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

MATH30-6-Lecture-3

The document outlines the objectives and various measures used in univariate analysis, including measures of central tendency (mean, median, mode), measures of variation (range, variance, standard deviation), and measures of shape (skewness, kurtosis). It provides definitions, characteristics, and examples for each measure, emphasizing their applications in describing and interpreting data. Additionally, it discusses the importance of understanding data dispersion and the implications of different statistical measures in analyzing datasets.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 66

Univariate Analysis

MATH30-6
Probability and Statistics
Objectives
At the end of the lesson, the students are expected to
• Define and differentiate various measures of describing
data;
• Describe a given set of data using various measures;
and
• Interpret values that arise from computation.
Measures of Describing Data
• Measure of Central Tendency
- Also known as Measure of Center, Measure of Central
Location
- Measure of finding the mean, median or mode of the
dataset

• Measure of Position
- Measure of finding the kth element of the distribution
- Also the quantiles or fractiles of distribution
Measures of Describing Data
• Measure of Variation
- Measure of how the data is distributed about the
mean.

• Measure of Shape
- Measure of the degree of symmetry of a distribution.
The Mean
• Most widely used parameter of describing a ratio
data.
• May be classified as
- Arithmetic mean
- Geometric mean
- Harmonic mean
- Trimmed mean
- Quadratic or Root Mean Square (RMS)
Arithmetic Mean
For Discrete Case

Sample mean

Population mean
Arithmetic Mean
Characteristics
• All values are used.
• It is unique.
• The arithmetic mean is the only measure of central
tendency where the sum of the deviations of each
value from the mean is zero.
• It is calculated by summing the values and dividing by
the number of values.
• Every set of interval-level and ratio-level data has a
mean.
• The mean is affected by unusually large or small data
values.
Arithmetic Mean
Arithmetic Mean
For Continuous Case
Arithmetic Mean
Examples:
Determine the average voltage of a signal that follows the
following waveforms.

1. ,

2. ,

3.
Weighted Mean
The weighted mean of a set of numbers with
corresponding weights is computed from the following
formula:
Weighted Mean
Example:
1. The Carter Construction Company pays its hourly
employees $16.50, $19.00, or $25.00 per hour. There
are 26 hourly employees, 14 of which are paid at the
$16.50 rate, 10 at the $19.00 rate, and 2 at $25.00
rate. What is the mean hourly rate paid of the 26
employees?
Geometric Mean
• Used in factors being multiplied to another quantity
• Examples of which are interest and discount rates,
amplification factors, being used.
Geometric Mean
Examples:
1. What is the geometric mean between the numbers 5,
9, 25, and 80?

2. ₱5000 is invested at a bank that offers interest rates


for five years. If the interest rates are 5% on the first
year, 6%, 8%, 10%, and 12% for the 2nd, 3rd, 4th, and 5th
year respectively. Find the average interest the bank
offers for 5 years?
Geometric Mean
Examples:
3. A four-stage amplifier with gains equal to 10, 30 , 50,
80 is to be replaced by a 4-stage identical amplifiers.
What is the gain of each stage of the new
configuration?

4. A cascaded system requires a two stage current gains


of 50 and 90. What is the current gain of each of the
transistors connected in Darlington pair (two identical
bipolar junction transistors [BJTs] connected in series)
that could replace the configuration?
Harmonic Mean
• Used for the average of the time rates or averages of
velocity, speed and alike.
Harmonic Mean
Examples:
1. If Dan’s speed in traveling from home to work is 30
kph and going home from work is 20 kph, find his
average velocity.

2. Find Edgar’s average velocity traveling from home to


school for a period of 5 days when his velocities are
recorded as follows.
Day 1 2 3 4 5
Velocity (kph) 25 30 18 26 15
Harmonic Mean
Examples:
3. Going to work, Mark travels at a rate of 30 kph. How
fast should he travel going home to obtain an average
speed of 35 kph?
Trimmed Mean
• Removing the upper and lower values (outliers) of the
distribution and getting the arithmetic mean
• Computed by “trimming away” a certain percent of
both the largest and smallest set of values
• For example the 10% trimmed mean is found by
eliminating the largest 10% and smallest 10% and
computing the average of the remaining values.
Trimmed Mean
Example

1. Determine the 10% trimmed mean of dataset:


1, 2, 3, 4, 5, 6, 8, 9, 10, 20
Quadratic Mean
• Sometimes called the root mean square (RMS) of a
certain quantity.
• Useful for certain quantities where the value of the
quantity is continuously changing from positive and
negative values such as the sinusoidal wave of the
voltage.
Quadratic Mean
Examples:
1. Find the quadratic mean of the values. 1, 2, −1, 2,
−1, 4, −1, 3

2. Determine the quadratic voltage of a signal that


follows the following waveforms.
a)
b)
The Median
• The midpoint of the values after they have been
ordered from the smallest to largest
• There are as many values above the median as below it
in the data array.
• For an even set of values, the median will be the
arithmetic average of the two middle numbers.

Sample median
if is odd,

if is even.
The Median
Characteristics
• There is a unique median for each data set.
• It is not affected by extremely large or small values and
is therefore a valuable measure of central tendency
when such values occur.
• It can be computed for ratio-level, interval-level, and
ordinal-level data.
• It can be computed for an open-ended frequency
distribution if the median does not lie in an open-
ended class.
The Median
Example:
1. Find the median of
1.8, 2.1, 1.7, 1.6, 0.9, 2.7, and 1.8
The Mode
• The value of the observation that appears most
frequently
The Mode
Characteristics
• Used when you want to find the most
occurring/frequent score
• A quick approximate of the average
• An inspection average
• The most unreliable among the three measures
because its value is undefined in some observations
• The only measure of central location that can be used
for nominal data
• Usually used in polls
• If a distribution is said to have 2 modes, it is bi-modal,
if three, a tri-modal. Generally, multi-modal.
The Mode
Example:
1. At a certain polls, the following data were recorded:
1 − Yes, 2 – No, 0 – Undecided. What is the modal choice?
()
Measures of Location
• Quantiles (or Fractiles) are points taken at regular
intervals from the cumulative distribution function of a
random variable.
• Dividing ordered data into essentially equal-sized data
subsets is the motivation for -quantiles; the quantiles
are the data values marking the boundaries between
consecutive subsets.
• There are -quantiles, with an integer satisfying .
Measures of Location
Quartiles
• Dividing the dataset into 4 groups.

Deciles
• Dividing the dataset into 10 groups.

Percentiles
• Dividing the dataset into 100 groups.
Quartile
• Any of the three fractiles obtained by dividing the set
of data into four equal parts

• is the lower quartile which contains the lowest 25%


of the data.

• is the median which divides the data into two equal


parts.

• is the third quartile (upper quartile) which contains


the upper 25% of the data.
Quartile
There is no universal agreement on a single procedure for
calculating quartiles, and different computer programs
often yield different results, For example, if you use the
data set of 1, 3, 6, 10, 15, 21, 28, and 36, you will get
these results:

Program
STATDISK 4.5 12.5 24.5
Minitab 3.75 12.5 26.25
Excel 5.25 12.5 22.75
TI-83 Plus 4.5 12.5 24.5
Measures of Location
By Interpolation
• Quartile – One fourth
First (1/4), Second (1/2), Third (3/4)
Quartile locator (Lq):
• Decile – One tenth
10%, 20%, …, 90%
Decile locator (Ld):
• Percentile − One hundredth
1%, 2%, …, 99%
Percentile locator (Lp):
Measures of Location
Examples:
1. Consider the observations 11, 14, 17, 23, 27, 32, 40,
49, 54, 59, 71, and 80. What is the 29th percentile?
Hint: Solve by interpolation.
Locator ()
Measures of Location
2. The magazine Forbes publishes annually a list of the
world’s wealthiest individuals. For 2007, the net worth
of the 20 richest individuals, in billion of dollars, in no
particular order, is as follows:

18, 18, 18, 18, 19, 20, 20, 20, 21, 22, 22, 23, 24, 26, 27, 32,
33, 49, 52, 56

Find the first percentile by interpolation.


Converting from the kth
Percentile to the
Corresponding Data
Value
Measures of Variation (Dispersion)
Why study dispersion?
• A measure of location, such as the mean or the median
does not tell us anything about the spread of the data.
• For example, if your nature guide told you that the
river averaged 3 feet in depth, would you want to wade
across on foot without additional information?
Probably not. You would want to know something
about the variation in depth.
Measures of Variation (Dispersion)
Why study dispersion?
• A second reason is to compare the spread in two or
more distributions.
• These are measures of the average distance of each
observation from the center of distribution.
• They measure the homogeneity or heterogeneity of a
particular group.
Measures of Variation
• Range
- The difference between the largest and smallest
number in the set
• Mean Absolute Deviation (MAD)
- The average of unsigned deviations from mean
• Variance
- The average of square deviations
• Standard Deviation (SD)
- The population/sample standard deviation is given as
the positive square root of population/sample variance
Measures of Variation
• Coefficient of Variation (CV)
- The percentage of the ratio of standard deviation to
the mean
Range
R=H─L
Consider the following data.
Grades in Statistics
Jon 100 Ann 84
Ron 65 Ria 86
Dan 75 Let 85
Tom 85 Bel 82
Bob 95 Nel 83
Range 35 Range 4
Range
Conclusion: Grades of males are more scattered while
grades of females are more compressed. Females are
more homogeneous in their math ability.

Disadvantages of the range:


1. Unstable for a very large class
2. Unreliable since only two values are taken into
account
3. Range of two sets of data with unequal number of
scores are not directly comparable
Variance and Standard Deviation
• Sample variance ()

• Sample standard deviation ()


- Positive square root of

The quantity is often called the degrees-of-freedom


associated with the variance estimate.
Variance and Standard Deviation
• Population variance ()

• Population standard deviation ()


- Positive square root of
Variance
Determine the variance in the previous example treating
the data as a population and sample.
Grades in Statistics
Jon 100 Ann 84
Ron 65 Ria 86
Dan 75 Let 85
Tom 85 Bel 82
Bob 95 Nel 83
84 84
Variance
Males
Variance
Females
Variance
Conclusion: Males showed more variability. The higher
the variance, the more variable or far apart the values are
from each other.

Remark: Since the variance is in squared units, it does not


reflect the true meaning of data being measured.
Standard Deviation
Males

Females
Measures of Variation
Example:

Consider the following test scores:


Test 1 2 3 4 5 6 7 8 9 10
Student 12 6 13 2 5 0 9 6 10 7
Student 8 10 9 12 5 1 4 7 9 3
a. Who performed better?
b. Who is more consistent?
Measures of Variation
a. Compute the average score of each student.

Student performed better because of the higher


computed average.
Measures of Variation
b. Compute the sample standard deviations.

Student is more consistent because of lower standard


deviation.
Measures of Variation
Remark: Standard deviation and variance are both reliable
but cannot be used in comparing two sets of data of
different units.

Example: Consistency of a player − assist or making points


Coefficient of Variation

where:
standard deviation ( or )
Player ’s record of assists and points in Game 1:

A 7 10 9 1 5 3 4 7 9 4
P 25 25 30 22 23 22 16 35 20

The player is more consistent in making points.


Measures of Variation
• Mean Absolute Deviation (MAD)

• Interquartile Range (IR)

• Quartile Deviation (QD)


Measures of Shape
• Skewness
- Degree of asymmetry of distribution about a mean. It
is a measure on how the data departs from being
symmetrical
- Can be interpreted as symmetric, positively skewed or
negatively skewed

• Kurtosis
- The degree of peakedness exhibited by the distribution
- Computed as the fourth degree moment from the
mean
Skewness
Pearsonian Coefficient of Skewness (Pearson’s Coefficient
of Skewness)

Interpretation of values:
1. Sk < 0, “negatively skewed” or “skewed to the left”
2. Sk = 0, symmetrical
3. Sk > 0, “positively skewed” or “skewed to the right”
Skewness
• A measure of the asymmetry of the frequency distribution

a. Positive skewness: mode < median < mean


b. Symmetrical: mode = median = mean
c. Negative skewness: mode > median > mean
Skewness
Moment Based Coefficient of Skewness

Bowley’s Coefficient of Skewness


Skewness
Other formulas

Interpretation of values from formulas above:


1. Sk < 0, “negatively skewed” or “skewed to the left”
2. Sk = 0, symmetrical
3. Sk > 0, “positively skewed” or “skewed to the right”
Kurtosis
• A measure of the degree to which a unimodal
distribution is peaked
• The state or quality of flatness or peakedness of the
curve describing a frequency distribution about its
mode

Leptokurtic Platykurtic

Mesokurtic
Kurtosis
Moment Based Coefficient of Kurtosis
Kurtosis
Absolute Kurtosis

Interpretation of values from formulas above:


1. K < 3, “platykurtic”
2. K = 3, “mesokurtic”
3. K > 3, “leptokurtic”
Kurtosis
Relative Kurtosis

Interpretation of values from formulas above:


1. Relative kurtosis < 0, “platykurtic”
2. Relative kurtosis = 0, “mesokurtic”
3. Relative kurtosis > 0, “leptokurtic”
Summary
• The measures of central tendency are mean, median,
and mode. Midrange is rarely used.
• Different types of means (arithmetic, weighted,
geometric, harmonic, etc.) are computed depending on
the nature of data.
• The measures of location are quartiles, deciles, and
percentiles.
• The measures of variation tell us about how the data is
distributed about the mean.
• The measures of shape refer to either skewness or
kurtosis.
References
• Montgomery and Runger. Applied Statistics and Probability
for Engineers, 5th Ed.
• Microsoft® Excel
• Walpole, et al. Probability and Statistics for Engineers and
Scientists 9th Ed. © 2012, 2007, 2002
• http://irving.vassar.edu/faculty/wl/econ209/dessript.pdf
• http://
www.preciousheart.net/chaplaincy/Auditor_Manual/10de
scsd.pdf

You might also like