0% found this document useful (0 votes)
39 views98 pages

2-Descriptive Statistics

The document discusses various statistical measures used to analyze numerical data, including measures of central tendency, measures of variability, and other related concepts. Definitions and calculations are provided for measures like mean, median, mode, range, variance, standard deviation, and more.

Uploaded by

Ankur Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views98 pages

2-Descriptive Statistics

The document discusses various statistical measures used to analyze numerical data, including measures of central tendency, measures of variability, and other related concepts. Definitions and calculations are provided for measures like mean, median, mode, range, variance, standard deviation, and more.

Uploaded by

Ankur Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 98

Descriptive Statistics

Numerical Measures

Prof. Suman Kalyan Ghosh


Alliance School of Applied Mathematics
Alliance University
Lecture Overview
• Measures of Location/Central Tendency
• Measures of Variability/Dispersion
• Measures of distribution shape, relative
location
• Weighted mean & Grouped Data
• Outliers & Box Plot
• Correlation Coefficient
MEASURES OF LOCATION

 Mean
 Median
 Mode
 Percentiles
 Quartiles
MEASURES OF LOCATION
Mean:
MEASURES OF LOCATION
Mean:
MEASURES OF LOCATION
Median:
The median is another measure of central
location. The median is the value in the middle
when the data are arranged in ascending order
(smallest value to largest value).
MEASURES OF LOCATION
MEASURES OF LOCATION
MEASURES OF LOCATION

Whenever a data set contains extreme values,


the median is often the preferred measure of
central location.
MEASURES OF LOCATION

Mode:
The mode is the value that occurs with greatest
frequency.
MEASURES OF LOCATION

Mode:
MEASURES OF LOCATION

Percentiles:
A percentile provides information about how
the data are spread over the interval from the
smallest value to the largest value.
MEASURES OF LOCATION

Percentiles:
For data that do not contain numerous repeated
values, the pth percentile divides the data into
two parts. Approximately p percent of the
observations have values less than the pth
percentile; approximately (100-p) percent of the
observations have values greater than the pth
percentile.
MEASURES OF LOCATION
Percentiles (Formal Definition):
The pth percentile is a value such that at least p
percent of the observations are less than or
equal to this value and at least (100-p) percent
of the observations are greater than or equal to
this value.
MEASURES OF LOCATION
Percentiles:

Colleges and universities frequently report


admission test scores in terms of percentiles.
MEASURES OF LOCATION
Percentiles:
Suppose an applicant obtains a raw score of 54 on
the verbal portion of an admission test. How this
student performed in relation to other students
taking the same test may not be readily apparent.
However, if the raw score of 54 corresponds to
the 70th percentile, we know that approximately
70% of the students scored lower than this
individual and approximately 30% of the students
scored higher than this individual.
MEASURES OF LOCATION
Percentiles Calculation:
MEASURES OF LOCATION
Percentiles Calculation Illustration:
MEASURES OF LOCATION
Percentiles Calculation Illustration: Interest is to
find the 85th percentile.
MEASURES OF LOCATION
Percentiles:

Note that the


50th percentile is the median.
MEASURES OF LOCATION

Quartiles:
It is often desirable to divide data into four
parts, with each part containing approximately
one-fourth, or 25% of the observations.
MEASURES OF LOCATION

Quartiles:
MEASURES OF LOCATION

Quartiles:
MEASURES OF LOCATION
Quartiles: (Illustration)
MEASURES OF LOCATION
Quartiles: (Illustration)
MEASURES OF LOCATION
Grouped Data: AVERAGE or MEAN
In case data are given as frequency distribution
or as grouped data. The mean is obtained by

Where
MEASURES OF LOCATION
Grouped Data: AVERAGE or MEAN
Example:

Value Frequency
1 5
2 9
3 12
4 17
5 14
6 10
7 6
MEASURES OF LOCATION
Grouped Data: AVERAGE or MEAN
Example: Value(x) Frequency(f) xf
1 5 5
2 9 18
3 12 36
4 17 68
5 14 70
6 10 60
7 6 42
TOTAL 73=N 299
Mean=299/73=4.09
MEASURES OF LOCATION
Grouped Data: AVERAGE or MEAN

In case of grouped or continuous frequency


distribution, “x” is taken as the mid value of the
corresponding class
MEASURES OF LOCATION
Grouped Data: AVERAGE or MEAN
Example:
Marks No. of Students
0-10 12
10-20 18
20-30 27
30-40 20
40-50 17
50-60 6
MEASURES OF LOCATION
Grouped Data: AVERAGE or MEAN
Example: No. of
Students Mid-Point
Marks (f) (x) x.f
0-10 12 5 60
10-20 18 15 270
20-30 27 25 675
30-40 20 35 700
40-50 17 45 765
50-60 6 55 330
TOTAL 100 2800
Mean=2800/100=28
MEASURES OF LOCATION
Grouped Data: MEDIAN
Example: Value Frequency
1 8
2 10
3 11
4 16
5 20
6 25
7 15
8 9
9 6
MEASURES OF LOCATION
Grouped Data: MEDIAN
Example:
1. Find N/2, where N is the total frequency.
2. See the cumulative frequency just greater than
N/2.
3. The corresponding value of “x” is the median
MEASURES OF LOCATION
Grouped Data: MEDIAN
Cumulative
N/2=120/2=60 Value Frequency frequency(<)
1 8 8
Median=5 2 10 18
3 11 29
4 16 45
5 20 65
6 25 90
7 15 105
8 9 114
9 6 120
MEASURES OF LOCATION
Grouped Data: MEDIAN

Wage No. of Employees


2000-3000 3
3000-4000 5
4000-5000 20
5000-6000 10
6000-7000 5
MEASURES OF LOCATION
Grouped Data: MEDIAN

L: Lower limit of the median class.


f: Frequency of the median class.
h: Width of the median class.
C: Cumulative Frequency of the class just
preceding the median class.
MEASURES OF LOCATION
Grouped Data: MEDIAN
No. of
Wage Employees cf
N/2=43/2=21.5
2000-3000 3 3
3000-4000 5 8
4000-5000 20 28
5000-6000 10 38
6000-7000 5 43
Median = 4000+(1000/20)(21.5-8)
= 4000+675 = 4675
MEASURES OF LOCATION
Grouped Data: MODE
Example:
Marks No. of Students
0-10 5
10-20 8
20-30 7
30-40 12
40-50 28
50-60 20
60-70 10
70-80 10
MEASURES OF LOCATION
Grouped Data: MODE
MEASURES OF LOCATION
Grouped Data: MODE No. of
Marks Students
0-10 5
10-20 8
20-30 7
30-40 12
40-50 28
50-60 20
60-70 10
70-80 10
MEASURES OF VARIABILITY

 Range
 Inter Quartile Range (IQR)
 Variance
 Standard Deviation
 Coefficient of Variation (CV)
MEASURES OF VARIABILITY

Range:
Range is
Largest Value - Smallest Value
MEASURES OF VARIABILITY

Range: (Illustration)
The largest starting salary is 3925 and the
smallest is 3310. The range is 3925-3310=615.
MEASURES OF VARIABILITY

Inter Quartile Range (IQR):


MEASURES OF VARIABILITY

Inter Quartile Range (IQR): (Illustration)


MEASURES OF VARIABILITY

Variance:
For Raw Data:

For Frequency Data:

Sample variance is denoted by (Not by ), is


used for population variance indication.
MEASURES OF VARIABILITY
Variance:
MEASURES OF VARIABILITY

Standard Deviation:
The standard deviation (sd) is defined to be the
positive square root of the variance. It is
denoted by σ.
Sample sd is denoted by s
For the previous example,
MEASURES OF VARIABILITY

Note:

Note that when we deal with sample, standard


deviation notation is “s”.
MEASURES OF VARIABILITY

Coefficient of Variation (CV):


In some situations we may be interested in a
descriptive statistic that indicates how large the
standard deviation is relative to the mean.
This measure is called the coefficient of
variation and is usually expressed as a
percentage.
MEASURES OF VARIABILITY

Coefficient of Variation (CV):

CV is expressed in percentage and is utilized for


comparing performances of individual on a
similar variable of measurement
MEASURES OF VARIABILITY

Mean Deviation about a point (A):

(1) For Raw Datasets:


Mean deviation about a point “A” is defined as
M.D.(about A)

Where A is any one of the averages Mean (M), Median


(Md) and Mode (Mo).
MEASURES OF VARIABILITY

Mean Deviation about a point (A):


(1) For Frequency/Grouped Datasets:

Here N is Total frequency.


Work Example
Example:
(a) Find the Mean Deviation from the Mean for the
following data :

(b) Also find the mean deviation about median.


(c) Compare the results obtained in (a) and (b).
MEASURES OF VARIABILITY

Moments:
 Raw Moments (Moments about origin “0”)
 Central Moments (Moments about sample
mean)
MEASURES OF VARIABILITY

Moments: Raw Moments (Moments about


origin “0”)
MEASURES OF VARIABILITY
Moments: Central Moments (Moments about
mean)
MEASURES OF DISTRIBUTION SHAPE

Measures of Skewness:
MEASURES OF DISTRIBUTION SHAPE
Other Measures of Skewness:
MEASURES OF DISTRIBUTION SHAPE
Measure of Kurtosis
Measure of Kurtosis
Pooled Mean & Pooled Variance
Suppose the data is given in different groups in
the following fashion.
Groups Sample Size Sample Sample
Mean Variance
Group-1
Group-2
Group-3
Pooled Mean & pooled Variance

Then to find
 overall mean taking all the groups together we
use the concept of pooled mean or Overall
Mean.
 overall variance taking all the groups together
we use the concept of pooled variance or
Composite variance.
Pooled Mean & Pooled Variance
Pooled Mean:

Pooled Variance (or Composite Variance):

where
Pooled Mean & Pooled Variance
Example:
In three villages, A, B, and C, the number of
people living are 150, 190, and 220, respectively.
In a survey it was found that the average daily
incomes for the 3 villages are 200 INR, 150 INR,
and 230 INR with standard deviations of incomes
as 10, 12, and 16, respectively.
Compute the overall average daily income and
standard deviation of the income considering all
the 3 villages together.
Pooled Mean & Pooled Variance
Example:

• Pooled Mean= 194.82 INR

• Pooled Variance= 669.57

• Pooled Standard Deviation=25.88 INR


MEASURES OF RELATIVE LOCATION

Z-Score:
Measures of relative location help us determine
how far a particular value is from the mean.
MEASURES OF RELATIVE LOCATION

Z-Score:

Z-score is called a standardized value


MEASURES OF RELATIVE LOCATION
Z-Score:
The z-score, , can be interpreted as the number
of standard deviations is from the mean .
For example, = 1.2 would indicate that is 1.2
standard deviations greater than the sample
mean.
Similarly, = - 0.5 would indicate that is .5, or 1/2,
standard deviation less than the sample mean.
MEASURES OF RELATIVE LOCATION
MEASURES OF RELATIVE LOCATION
Data Exploration
Five Number Summary:
In a five-number summary, the following five
numbers are used to summarize the data:
Data Exploration
Five Number Summary:
For Graduate Salary Data:
Data Exploration

Box Plot:
A box plot is a graphical summary of data that is
based on a five-number summary.
Data Exploration
Box Plot:
Data Exploration
Outliers:
Sometimes a data set will have one or more
observations with unusually large or unusually
small values. These extreme values are called
outliers.
Data Exploration
Outliers:
Experienced statisticians take steps to identify
outliers and then review each one carefully.
Data Exploration
Outliers:
 An outlier may be a data value that has
been incorrectly recorded. If so, it can be
corrected before further analysis.
 An outlier may also be from an observation
that was incorrectly included in the data
set; if so, it can be removed.
Data Exploration
Outliers:
 Finally, an outlier may be an unusual data
value that has been recorded correctly and
belongs in the data set. In such cases it
should remain.
Correlation

Sometime we may be interested in finding out


relationship measure between two variables
considered in our data.
Correlation

Covariance between X and Y


Correlation

Covariance:
Example: Sample data for Stereo & Sound
Equipment.
Correlation

Covariance:
Scatter diagram
Correlation

Covariance:
Correlation

Covariance:
Correlation

Interpretation of Covariance:
Correlation

Interpretation of Covariance:
Correlation

Interpretation of Covariance:
Pearson’s Correlation Coefficient

Karl Pearson’s Correlation Coefficient:


A unit free measure which measures the linear
relationship between two variables.
Pearson’s Correlation Coefficient

Karl Pearson’s Correlation Coefficient:

Where,
Pearson’s Correlation Coefficient

Pearson’s Correlation Coefficient:


Pearson’s Correlation Coefficient

Karl Pearson’s Correlation Coefficient:


Pearson’s Correlation Coefficient

Pearson’s Correlation Coefficient:


We conclude that a strong positive linear
relationship occurs between the number of
commercials and sales.
More specifically, an increase in the number of
commercials is associated with an increase in
sales.
Pearson’s Correlation Coefficient

Pearson’s Correlation Coefficient: Note


1. rxy ranges from -1 to +1
2. It measures linear relationship between variables
3. Need quantitative variables to calculate the
coefficient.
4. Correlation Coefficient is unit free.
Pearson’s Correlation Coefficient

Alternative Way of Calculation:

rXY
Pearson’s Correlation Coefficient

Work Example:
• Calculate Karl Pearson’s coefficient of correlation
between expenditure on advertising and sales from the
data given below.

• Comment on your finding.


Pearson’s Correlation Coefficient

Home Work: (1)

You might also like