2-Descriptive Statistics
2-Descriptive Statistics
Numerical Measures
Mean
Median
Mode
Percentiles
Quartiles
MEASURES OF LOCATION
Mean:
MEASURES OF LOCATION
Mean:
MEASURES OF LOCATION
Median:
The median is another measure of central
location. The median is the value in the middle
when the data are arranged in ascending order
(smallest value to largest value).
MEASURES OF LOCATION
MEASURES OF LOCATION
MEASURES OF LOCATION
Mode:
The mode is the value that occurs with greatest
frequency.
MEASURES OF LOCATION
Mode:
MEASURES OF LOCATION
Percentiles:
A percentile provides information about how
the data are spread over the interval from the
smallest value to the largest value.
MEASURES OF LOCATION
Percentiles:
For data that do not contain numerous repeated
values, the pth percentile divides the data into
two parts. Approximately p percent of the
observations have values less than the pth
percentile; approximately (100-p) percent of the
observations have values greater than the pth
percentile.
MEASURES OF LOCATION
Percentiles (Formal Definition):
The pth percentile is a value such that at least p
percent of the observations are less than or
equal to this value and at least (100-p) percent
of the observations are greater than or equal to
this value.
MEASURES OF LOCATION
Percentiles:
Quartiles:
It is often desirable to divide data into four
parts, with each part containing approximately
one-fourth, or 25% of the observations.
MEASURES OF LOCATION
Quartiles:
MEASURES OF LOCATION
Quartiles:
MEASURES OF LOCATION
Quartiles: (Illustration)
MEASURES OF LOCATION
Quartiles: (Illustration)
MEASURES OF LOCATION
Grouped Data: AVERAGE or MEAN
In case data are given as frequency distribution
or as grouped data. The mean is obtained by
Where
MEASURES OF LOCATION
Grouped Data: AVERAGE or MEAN
Example:
Value Frequency
1 5
2 9
3 12
4 17
5 14
6 10
7 6
MEASURES OF LOCATION
Grouped Data: AVERAGE or MEAN
Example: Value(x) Frequency(f) xf
1 5 5
2 9 18
3 12 36
4 17 68
5 14 70
6 10 60
7 6 42
TOTAL 73=N 299
Mean=299/73=4.09
MEASURES OF LOCATION
Grouped Data: AVERAGE or MEAN
Range
Inter Quartile Range (IQR)
Variance
Standard Deviation
Coefficient of Variation (CV)
MEASURES OF VARIABILITY
Range:
Range is
Largest Value - Smallest Value
MEASURES OF VARIABILITY
Range: (Illustration)
The largest starting salary is 3925 and the
smallest is 3310. The range is 3925-3310=615.
MEASURES OF VARIABILITY
Variance:
For Raw Data:
Standard Deviation:
The standard deviation (sd) is defined to be the
positive square root of the variance. It is
denoted by σ.
Sample sd is denoted by s
For the previous example,
MEASURES OF VARIABILITY
Note:
Moments:
Raw Moments (Moments about origin “0”)
Central Moments (Moments about sample
mean)
MEASURES OF VARIABILITY
Measures of Skewness:
MEASURES OF DISTRIBUTION SHAPE
Other Measures of Skewness:
MEASURES OF DISTRIBUTION SHAPE
Measure of Kurtosis
Measure of Kurtosis
Pooled Mean & Pooled Variance
Suppose the data is given in different groups in
the following fashion.
Groups Sample Size Sample Sample
Mean Variance
Group-1
Group-2
Group-3
Pooled Mean & pooled Variance
Then to find
overall mean taking all the groups together we
use the concept of pooled mean or Overall
Mean.
overall variance taking all the groups together
we use the concept of pooled variance or
Composite variance.
Pooled Mean & Pooled Variance
Pooled Mean:
where
Pooled Mean & Pooled Variance
Example:
In three villages, A, B, and C, the number of
people living are 150, 190, and 220, respectively.
In a survey it was found that the average daily
incomes for the 3 villages are 200 INR, 150 INR,
and 230 INR with standard deviations of incomes
as 10, 12, and 16, respectively.
Compute the overall average daily income and
standard deviation of the income considering all
the 3 villages together.
Pooled Mean & Pooled Variance
Example:
Z-Score:
Measures of relative location help us determine
how far a particular value is from the mean.
MEASURES OF RELATIVE LOCATION
Z-Score:
Box Plot:
A box plot is a graphical summary of data that is
based on a five-number summary.
Data Exploration
Box Plot:
Data Exploration
Outliers:
Sometimes a data set will have one or more
observations with unusually large or unusually
small values. These extreme values are called
outliers.
Data Exploration
Outliers:
Experienced statisticians take steps to identify
outliers and then review each one carefully.
Data Exploration
Outliers:
An outlier may be a data value that has
been incorrectly recorded. If so, it can be
corrected before further analysis.
An outlier may also be from an observation
that was incorrectly included in the data
set; if so, it can be removed.
Data Exploration
Outliers:
Finally, an outlier may be an unusual data
value that has been recorded correctly and
belongs in the data set. In such cases it
should remain.
Correlation
Covariance:
Example: Sample data for Stereo & Sound
Equipment.
Correlation
Covariance:
Scatter diagram
Correlation
Covariance:
Correlation
Covariance:
Correlation
Interpretation of Covariance:
Correlation
Interpretation of Covariance:
Correlation
Interpretation of Covariance:
Pearson’s Correlation Coefficient
Where,
Pearson’s Correlation Coefficient
rXY
Pearson’s Correlation Coefficient
Work Example:
• Calculate Karl Pearson’s coefficient of correlation
between expenditure on advertising and sales from the
data given below.