0% found this document useful (0 votes)

49 views9 pages

Review Question - C2 - SACR3080

Uploaded by

Indira Brown

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views9 pages

Review Question - C2 - SACR3080

Uploaded by

Indira Brown

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

Chapter 2

Measures of Central Tendency

Notes:
 Mode (Mo): The most frequently observed category or value in a dataset.
 Median (Md): The value that divides an ordered dataset into two equal
halves. It is also the 50th percentile (P50), marking the point where 50% of the data
lies below and 50% above.
 Mean (x̄ ): The average of all values in a dataset, calculated by summing
the values (∑xi) and dividing by the number of observations (N). Often referred to
as "x-bar” = ∑ xi/N.

Review Questions:
1. What rules of thumb are sometimes offered as to how levels of measurement should be
associated with measures of central tendency?

 Nominal: Use the mode since there is no inherent order, and the mean and median cannot
be applied.
 Ordinal: Use the median because the data has a rank or order, but the distances between
categories are not precise. The mode is also applicable but doesn't take advantage of
ordering.
 Interval/Ratio: Use mode, median, or mean, as the data is ordered and the distances
between values are consistent.

2. What additional considerations must be borne in mind?

 It is acceptable to use a lower-level measure (like the median or mode) even with
interval or ratio data, depending on the goal of the analysis. For instance, the median
might be a better choice when the mean does not represent the central tendency well due
to outliers.
 If we have interval or ratio data, and we have reason to point out the value which divides
our observations into upper and lower halves, the median may be preferable to the mean.
We may also prefer the median when it better represents the bulk of our cases.

3. What are three situations in which we might want to use a mode (a) and one in which it could
be misleading (b)?

(a) The mode is preferable when dealing with nominal or ordinal data, in the presence of skewed
distributions or outliers, and when identifying common categories or preferences is more
relevant than calculating an average.

Explanation:
 Nominal: when dealing with nominal data (categories without a specific order), the mode
is the only measure of central tendency that can be used. It helps identify the most
frequently occurring category, such as the most popular product in a survey.
 Non-Normal Distributions: in distributions that are skewed or have outliers, the mode
can provide a better representation of the most common value than the mean. This is
because the mode is not influenced by the magnitude of extreme values.
 Ordinal Data: in ordinal datasets, where the values have a meaningful order but the
intervals between values are not consistent, the mode can effectively represent the most
common ranking or preference without assuming equal distances between categories.
 Descriptive Statistics in Specific Fields: in certain fields such as marketing, sociology,
and psychology, identifying the most common response or behavior can be more valuable
than averaging all responses. For example, knowing the most frequently chosen option in
a consumer preference study can inform product development.
 Robustness to Outliers: the mode remains unchanged despite the presence of extreme
values, making it a stable measure when the dataset contains outliers that distort the
mean.

Note: instability of the mode: the mode becomes unstable when two or more categories are
about equally common.

(b) The mode can be misleading when the most frequent value(s) is not representative of the
overall group.
Example: Bimodal or Multimodal Distributions:

 In datasets with two or more modes, the presence of multiple values that occur with the
highest frequency can lead to confusion. For example, in a test score dataset where
students cluster around two different scores (e.g., 50 and 80), the mode could suggest that
both scores are equally representative of student performance, which may not reflect the
overall distribution effectively.

4. Define the median. What are three situations in which we might want to use it?

 The Median (Md): the value that divides an ordered dataset into two equal
halves. It is also the 50th percentile (P50), marking the point where 50% of
the data lies below and 50% above.
 The median is preferred for skewed distributions, datasets with outliers, and ordinal data.
It is preferred when it better represents the bulk of the cases.
Explanation:
 Income and Wealth Distribution: in analyzing income or wealth data, the median is
often preferred because it is less affected by extreme values (outliers). For instance, in a
dataset where most individuals earn between $30,000 and $50,000 but a few earn
millions, the mean income might give a skewed impression of the average income. The
median, however, would provide a more accurate reflection of what a typical person
earns, as it represents the middle point of the distribution.
 Ordinal Data: when working with ordinal data, where the values have a specific order
but the intervals between them are not consistent, the median is a suitable measure. For
example, in a survey asking respondents to rate satisfaction on a scale of 1 to 5, the
median can indicate the central tendency of responses without assuming equal differences
between the ratings.
 Skewed Distributions: in datasets that are skewed (either positively or negatively), the
median provides a better measure of central tendency than the mean. For example, in a
dataset of test scores where a few students perform exceptionally well while most
perform at a lower level, the mean might be inflated by the high scores. The median will
give a more reliable representation of the typical test score among the majority of
students.

5. What is one situation in which a median could be misleading? What might we do in such a
situation?

 The median can be misleading when there are few cases around the center of the
distribution. To counter this, we can use techniques like the broadened median, nearby
percentiles, or the mid-mean, which incorporate more data for a more stable measure of
central tendency. If these options aren't viable, relying on the mode or accepting the
instability of the median may be necessary.

Explanation:
One situation in which the median could be misleading is when dealing with small samples
that have sparse data points near the center of the distribution. In such cases, minor changes
in the values of just a few observations can lead to significant shifts in the median. For example,
in a “bathtub” distribution where there are large gaps between the central cases, the median may
not reliably represent the central tendency if a few data points are added or adjusted.
What to Do in Such a Situation:
Broadened Median:
- Instead of relying solely on the median, calculate a "broadened median" by taking the average
of the median and one or two observations on either side. This approach incorporates more data
and can provide a more stable measure.
Use of Nearby Percentiles:
- Calculate the average of percentiles near the center, such as $ (P40 + P50 + P60) / 3 $. This
method utilizes additional data points surrounding the median, leading to a more stable estimate.
Mid-Mean:
- Compute the "mid-mean," which is the mean of all observations in the central half of the
distribution. This approach takes into account more data points than the median alone, which
helps to mitigate the instability.
Consider the Mode:
- If the median remains unstable and the data allows for it, consider using the mode as an
alternative measure of central tendency, especially if the data is categorical or ordinal.
Assess Sample Size:
- Whenever possible, increase the sample size to minimize the impact of sparse data points,
making the median a more reliable measure in the first place.

6. What is one situation in which we might want to use a mean, and one situation in which it
could be misleading?

 The mean is useful for symmetric datasets without outliers, while it can be misleading
(unstable) in the presence of outlying cases, which, if extreme enough, can shift it greatly,
like income distributions, where alternatives like the median or trimmed mean provide a
more accurate measure of central tendency.

Explanation:
One situation where we might want to use the mean is when dealing with a dataset that is
symmetrically distributed without outliers. For example, in a study of test scores where most
students score between 70 and 90, the mean would accurately reflect the average performance of
the entire group, providing a useful measure of central tendency.

The mean can be misleading in the presence of outliers, such as in income distributions. For
instance, if a few individuals earn significantly higher incomes compared to the rest of the
population, the mean income may be inflated and not accurately represent the typical income of
the majority. In such cases, alternatives like the median or trimmed mean are preferred, as they
provide a more accurate reflection of the central tendency without being skewed by extreme
values.

7. Suppose that we have ratio data, e.g., earned income, but the sample is small and there is
reason to think that the mean is unstable. What are some alternatives?

 When the mean is unstable in small samples of ratio data like earned income, alternatives
such as trimmed means, the median, mid-mean, Winsorized mean, percentile measures,
and weighted means can be utilized to provide a more reliable and representative measure
of central tendency. These methods are particularly effective in handling the influence of
outliers.

Explanation:
Trimmed Mean:
- In a trimmed mean, a specified percentage (N%) of the highest and lowest observations is
removed before recalculating the mean. This approach helps mitigate the impact of outliers while
retaining more data than the median. Commonly, between 5% and 10% is trimmed from each
end.
Median:
- The median is a measure of central tendency that divides the dataset into two equal halves. It
is not influenced by outliers, making it a preferred measure in income data where extreme values
can distort the mean. The median provides a clear interpretation of the typical income level.
Mid-Mean:
- The mid-mean calculates the average of all observations in the central half of the distribution.
This method reduces the influence of extreme values by focusing on data that are closer to the
median, providing a more stable average.
Winsorized Mean:
- Winsorizing involves replacing the extreme values in the dataset with values closer to the
center (e.g., setting the highest and lowest 1% of values to the next highest and lowest values).
This method helps reduce the effect of outliers while still providing an average.
Percentile Measures:
- Using percentile values, such as the 25th (P25), 50th (P50, or median), and 75th (P75), can
give a clearer picture of the income distribution without being affected by outliers. This approach
helps to understand the spread and typical earnings.
6. Weighted Mean:
- Assigning lower weights to cases that are further away from the central part of the
distribution can also help create a more stable average. This method allows the mean to be
adjusted based on the distribution's shape.
8. Why is the sample mean often replaced by, or supplemented by, the median for income
distributions?

 The sample mean is often replaced or supplemented by the median for income
distributions because the median is not affected by outliers, is easy to interpret, and
provides a clearer picture of typical income levels

9. For skewed distributions, what often happens as we trim more cases from the ends of the
distribution?

 For skewed distributions, as we trim more cases from the ends, trimmed means tend to
get closer to the median. This means that when we remove extreme values from either
side of the distribution, the average of the remaining values (the trimmed mean) shifts
closer to the middle value (the median).

Explanation:
By cutting out the outliers—whether they are very high or very low incomes—we get a clearer
picture of what most people earn. As we keep trimming away these extreme cases, we find that
the average (trimmed mean) aligns more closely with the median, which better reflects the
typical income. This process helps to smooth out the influence of unusual cases and provides a
more accurate understanding of the data.

10. In what sense are the mode, median, and mean averages?

 The mode is the most common value in a dataset, indicating frequency, but it can be
misleading in datasets with many categories or when it represents only a small portion of
the data; thus, it's important to include the number or proportion of cases it reflects. The
median, as a "positional average," lies in the middle of an ordered dataset, dividing it into
two equal halves and minimizing the sum of absolute deviations from it. The mean is
calculated by summing all values and dividing by the number of observations, balancing
the data so that the total deviations from it equal zero, thus providing a central point that
minimizes overall error and squared deviations.
Explanation:

Mode: the mode is the most common or typical value in a dataset. It shows which value occurs
most frequently. However, it can sometimes be misleading, especially in datasets with many
categories or when the mode accounts for only a small portion of the data. When reporting the
mode, it’s important to include the number or proportion of cases it represents.

Median: the median is a “positional average” because it is located in the middle of an ordered
dataset. It divides the data into two equal halves. An important feature of the median is that it
minimizes the sum of absolute deviations from it, meaning that no other value can achieve a
lower total distance when you look at how far each data point is from the median.
Mean: the mean is calculated by adding all the values in a dataset and dividing by the number of
observations. It is considered an average because it balances the data by ensuring that the total of
the deviations from the mean (both above and below) equals zero. This makes it a central point
in the sense that it neither overestimates nor underestimates the data. Additionally, the mean
minimizes the sum of squared deviations, meaning it is the best measure to use when we want to
reduce the overall error in a dataset.

11. In what sense is the median the point closest to the observed data?

 The median is considered the point closest to the observed data because it minimizes the
sum of absolute deviations. This means that the total distance of all data points from the
median is less than for any other value, making it the best representation of the center of
the dataset.

12. In what sense does the mean lie in the center of a distribution?

 The mean lies at the center of a distribution by balancing the values on either side. It is
the point where the sum of positive deviations equals the sum of negative deviations,
meaning that it doesn't overestimate or underestimate the data. In skewed distributions,
the mean often shifts toward the tail, indicating that it reflects the influence of extreme
values.

13. What is another technical merit of the mean as a measure of central tendency?
 Another technical merit of the mean is that it minimizes the sum of squared deviations
from it. This property makes the mean particularly useful for statistical analyses, as it
captures the central tendency of the data effectively and helps reduce overall error in the
dataset.

14. In relations to one another, where do the mode, median, and mean lie if we have a
continuous, single-peaked, and skewed distribution?

 In a continuous, single-peaked, and skewed distribution, the mode is located at the peak
of the distribution, the median lies in the middle, and the mean is pulled toward the tail.
 Said in other words, for single-peaked, skewed, and continuous distributions, the mean
will lie farthest into the long tail, followed by the median, and the mode will lie in neither
tail.

 Generally, the ordering is mode < median < mean for right-skewed distributions, with the
mean being the furthest into the tail. This relationship can vary in other types of
distributions, highlighting the importance of context when analyzing these measures.

Capstone Project - Credit Risk Analysis
67% (6)
Capstone Project - Credit Risk Analysis
50 pages
Super Six 2.2
No ratings yet
Super Six 2.2
11 pages
Chapter 9
No ratings yet
Chapter 9
126 pages
M2 - Problem Set - Introduction To Statistics-2021 - Lagios
No ratings yet
M2 - Problem Set - Introduction To Statistics-2021 - Lagios
15 pages
STA2023 Summary Notes: Chapter 1 - 10
No ratings yet
STA2023 Summary Notes: Chapter 1 - 10
58 pages
Chapter 3 - Displaying and Summarizing Quantitative Data
No ratings yet
Chapter 3 - Displaying and Summarizing Quantitative Data
77 pages
Nelson Rules of SPC
100% (3)
Nelson Rules of SPC
4 pages
Week 5 Worksheet Answers
No ratings yet
Week 5 Worksheet Answers
6 pages
Book-Sher Muhammad Chaudary - 89-133 PDF
100% (1)
Book-Sher Muhammad Chaudary - 89-133 PDF
45 pages
MTH 102: Probability and Statistics: Quiz 7 Post (A Light) Lunch Assignment 27/05/2020 Sanjit K. Kaul
No ratings yet
MTH 102: Probability and Statistics: Quiz 7 Post (A Light) Lunch Assignment 27/05/2020 Sanjit K. Kaul
3 pages
Ch-9 Data Preparation and Preliminary Analysis
No ratings yet
Ch-9 Data Preparation and Preliminary Analysis
15 pages
Chapter 07 Sampling
No ratings yet
Chapter 07 Sampling
22 pages
Groebner Business Statistics 7 Ch06
No ratings yet
Groebner Business Statistics 7 Ch06
46 pages
Hypothesis Testing - 2 Populations
100% (1)
Hypothesis Testing - 2 Populations
26 pages
Statistic Interview Questions and Answers by Jeevan Raj
No ratings yet
Statistic Interview Questions and Answers by Jeevan Raj
21 pages
Discrete Data Is A Count That Involves Integers. Only A Limited Number of
No ratings yet
Discrete Data Is A Count That Involves Integers. Only A Limited Number of
3 pages
Chapter 2 - Describing Data
No ratings yet
Chapter 2 - Describing Data
24 pages
Tutoring Session 2023 - Statistics For Business
No ratings yet
Tutoring Session 2023 - Statistics For Business
65 pages
Chapter 9. Test of Hypotheses For A Single Sample
No ratings yet
Chapter 9. Test of Hypotheses For A Single Sample
98 pages
Statistics For Business and Economics: Describing Data: Numerical
No ratings yet
Statistics For Business and Economics: Describing Data: Numerical
40 pages
Assignment Updated 101
100% (1)
Assignment Updated 101
24 pages
OUTLIERS
100% (1)
OUTLIERS
5 pages
Quartiles, Deciles, Percentiles
100% (1)
Quartiles, Deciles, Percentiles
5 pages
9-3 Basics of Statistics: Unit 9 Probability and Mathematical Induction
No ratings yet
9-3 Basics of Statistics: Unit 9 Probability and Mathematical Induction
16 pages
Chapter 7 - Sampling Distributions
No ratings yet
Chapter 7 - Sampling Distributions
43 pages
Applications of Statistical Software For Data Analysis
No ratings yet
Applications of Statistical Software For Data Analysis
5 pages
Probability Distribution
100% (1)
Probability Distribution
22 pages
Lab Report Gassiuan Distribution
100% (1)
Lab Report Gassiuan Distribution
13 pages
Levine Smume6 01
100% (1)
Levine Smume6 01
14 pages
Chapter 9 Fundamental of Hypothesis Testing
No ratings yet
Chapter 9 Fundamental of Hypothesis Testing
26 pages
Basic Statistics
No ratings yet
Basic Statistics
52 pages
Set 1
0% (10)
Set 1
2 pages
AK - STATISTIKA - 01 - Describing Data
No ratings yet
AK - STATISTIKA - 01 - Describing Data
26 pages
9.data Analysis
No ratings yet
9.data Analysis
25 pages
4 Measures of Central Tendency, Position, Variability PDF
100% (1)
4 Measures of Central Tendency, Position, Variability PDF
24 pages
Stat Term Paper
No ratings yet
Stat Term Paper
17 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
8 pages
BNM854 Week 1 Graphical Excellence Handout
No ratings yet
BNM854 Week 1 Graphical Excellence Handout
13 pages
Karl Pearson's Measure of Skewness
No ratings yet
Karl Pearson's Measure of Skewness
27 pages
Statistics
No ratings yet
Statistics
41 pages
Exploratory Data Analysis - Komorowski PDF
No ratings yet
Exploratory Data Analysis - Komorowski PDF
20 pages
Frequency Distributions and Graphs: © The Mcgraw-Hill Companies, Inc., 2000
No ratings yet
Frequency Distributions and Graphs: © The Mcgraw-Hill Companies, Inc., 2000
47 pages
20 Mean Median Mode
No ratings yet
20 Mean Median Mode
8 pages
Chapter 4: Probability Distributions: 4.1 Random Variables
100% (2)
Chapter 4: Probability Distributions: 4.1 Random Variables
53 pages
Ignou PGDAST Assignment Booklet Jan-Dec 2020
No ratings yet
Ignou PGDAST Assignment Booklet Jan-Dec 2020
30 pages
Issues and Procedures in Adopting Structural Equation Modelling T
No ratings yet
Issues and Procedures in Adopting Structural Equation Modelling T
9 pages
Statistics For Decisions Making: Dr. Rohit Joshi, IIM Shillong, Rj@iimshillong - in
No ratings yet
Statistics For Decisions Making: Dr. Rohit Joshi, IIM Shillong, Rj@iimshillong - in
10 pages
Chapter 2 Part 2
No ratings yet
Chapter 2 Part 2
12 pages
Measures of Central Tendency and Dispersion
No ratings yet
Measures of Central Tendency and Dispersion
64 pages
Chapter 6 Section 4-5: Probability: Multiple Choice
No ratings yet
Chapter 6 Section 4-5: Probability: Multiple Choice
7 pages
1305AFE Business Data Analysis: Statistical Inference
No ratings yet
1305AFE Business Data Analysis: Statistical Inference
58 pages
Complete Business Statistics: by Amir D. Aczel & Jayavel Sounderpandian 6 Edition
No ratings yet
Complete Business Statistics: by Amir D. Aczel & Jayavel Sounderpandian 6 Edition
74 pages
Estimation and Hypothesis
100% (2)
Estimation and Hypothesis
32 pages
Sample Size Dr. Karmran
No ratings yet
Sample Size Dr. Karmran
5 pages
Common Probability Distributions: D. Joyce, Clark University Aug 2006
No ratings yet
Common Probability Distributions: D. Joyce, Clark University Aug 2006
9 pages
EC2303 Final Formula Sheet PDF
No ratings yet
EC2303 Final Formula Sheet PDF
8 pages
Ppa 696 Research Methods Univariate Data Analysis
No ratings yet
Ppa 696 Research Methods Univariate Data Analysis
10 pages
8614 Assignment No 2
No ratings yet
8614 Assignment No 2
26 pages
Educ 201-No.1
No ratings yet
Educ 201-No.1
4 pages
Engineering Statistics: Measures of Central Tendency
No ratings yet
Engineering Statistics: Measures of Central Tendency
10 pages
Descriptive Analytics Notes
No ratings yet
Descriptive Analytics Notes
6 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
7 pages
Overview Of Bayesian Approach To Statistical Methods: Software
From Everand
Overview Of Bayesian Approach To Statistical Methods: Software
Vinaitheerthan Renganathan
No ratings yet
John Cod - Coding Languages - SQL, Linux, Python, Machine Learning. The Step-By-Step Guide For Beginners
No ratings yet
John Cod - Coding Languages - SQL, Linux, Python, Machine Learning. The Step-By-Step Guide For Beginners
472 pages
CC Unit - 4 Imp Questions
No ratings yet
CC Unit - 4 Imp Questions
4 pages
Hawes, Zimmer-Gembeck, Campbell - 2
No ratings yet
Hawes, Zimmer-Gembeck, Campbell - 2
11 pages
OREAS 20b Certificate
No ratings yet
OREAS 20b Certificate
23 pages
Business Statistics: Dr. Basheer Ahmad Samim
No ratings yet
Business Statistics: Dr. Basheer Ahmad Samim
70 pages
Chapter 2 Part1
No ratings yet
Chapter 2 Part1
33 pages
ASQ - Yazmin Montana - 2022 - Lean-Six Sigma in The Age of Artificial Intelligence
No ratings yet
ASQ - Yazmin Montana - 2022 - Lean-Six Sigma in The Age of Artificial Intelligence
6 pages
Advance AI & ML Certification Program Learnbay
No ratings yet
Advance AI & ML Certification Program Learnbay
45 pages
Data Science and Data Analytics: Part B
No ratings yet
Data Science and Data Analytics: Part B
42 pages
Exploratory Sensor Data Analysis in Python - by Mabel González Castellanos - Towards Data Science
No ratings yet
Exploratory Sensor Data Analysis in Python - by Mabel González Castellanos - Towards Data Science
19 pages
10-SERIES 10000 Testing Materials and Workmanship
No ratings yet
10-SERIES 10000 Testing Materials and Workmanship
82 pages
How To Write A Lab Report 6
No ratings yet
How To Write A Lab Report 6
11 pages
Assignment 2 1 Biostatistics 0301132fall2023 Section 54 Fall 2023
No ratings yet
Assignment 2 1 Biostatistics 0301132fall2023 Section 54 Fall 2023
14 pages
Exam 1 Fall02
No ratings yet
Exam 1 Fall02
5 pages
PIS Broj 1-22 - Establishing A Datum Point
No ratings yet
PIS Broj 1-22 - Establishing A Datum Point
10 pages
Statistics For Data Science
No ratings yet
Statistics For Data Science
30 pages
Data Pre Processing
No ratings yet
Data Pre Processing
48 pages
Assinment
No ratings yet
Assinment
2 pages
Mba It Unit 2
No ratings yet
Mba It Unit 2
6 pages
Evaluation of Dynamic Microchamber As A Quick Factory Formaldehyde Emission Control Method For Industrial Particleboards
No ratings yet
Evaluation of Dynamic Microchamber As A Quick Factory Formaldehyde Emission Control Method For Industrial Particleboards
11 pages
ML Prep For Samsung
No ratings yet
ML Prep For Samsung
73 pages
Incident Metrics in Sre
No ratings yet
Incident Metrics in Sre
36 pages
Effects of Surface Roughness On Aerodyna PDF
No ratings yet
Effects of Surface Roughness On Aerodyna PDF
15 pages
Investigation of OOS
No ratings yet
Investigation of OOS
35 pages
Center, Spread and Shape of Distribution
No ratings yet
Center, Spread and Shape of Distribution
11 pages
Iso 4259
No ratings yet
Iso 4259
2 pages
AP Statistics HW - Unit 1 MC
No ratings yet
AP Statistics HW - Unit 1 MC
3 pages

Review Question - C2 - SACR3080

Uploaded by

Review Question - C2 - SACR3080

Uploaded by

Chapter 2

Measures of Central Tendency

2. What additional considerations must be borne in mind?

You might also like