0% found this document useful (0 votes)

30 views108 pages

Module 1

The document discusses organizing and presenting numerical data through various statistical methods. It covers ordered arrays, frequency distributions, histograms, polygons and cumulative distributions as ways to summarize and visualize numerical data.

Uploaded by

Anoop Thomas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views108 pages

Module 1

Uploaded by

Anoop Thomas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 108

Module 1

Learning Objective
• To convert raw data to useful information

• To construct and use data arrays

• To construct and use frequency distributions

• To graph frequency distributions with histograms, polygons, and ogives

• To use frequency distributions to make decisions

What is Statistics?
What is Statistics?
Statistics: The science of collecting, describing, and interpreting data.

Two areas of statistics:

Descriptive Statistics: collection, presentation, and description of

sample data.
Inferential Statistics: making decisions and drawing conclusions about
populations.
Introduction
• The field of statistics is the science of learning from data.
• Statistical knowledge helps you use the proper methods to collect
the data, employ the correct analyses, and effectively present the
results.
• Statistics is a crucial process behind how we make discoveries in
science, make decisions based on data, and make predictions.
• Statistics allows you to understand a subject much more deeply.

Statistics uses Numerical Evidence to draw valid Conclusions

Type of Statistics

Descriptive Inferential
Statistics statistics
Drawing conclusions
Collecting, and/or making
summarizing and decisions concerning
describing the data a population based
only on sample data
Example: A recent study examined the math and verbal SAT scores of high school seniors
across the country. Which of the following statements are descriptive in nature and
which are inferential.

• The mean math SAT score was 492.

• The mean verbal SAT score was 475.

• Students in the Northeast scored higher in math but lower in verbal.

• 80% of all students taking the exam were headed for college.

• 32% of the students scored above 610 on the verbal SAT.

• The math SAT scores are higher than they were 10 years ago.
Introduction to Basic Terms
Population: A collection, or set, of individuals or objects or events
whose properties are to be analyzed.
Two kinds of populations: finite or infinite.

Sample: A subset of the population.

Variable: A characteristic about each individual element of a population or sample.

Data (singular): The value of the variable associated with one element of a population
or sample. This value may be a number, a word, or a symbol.

Data (plural): The set of values collected for the variable from each of the elements
belonging to the sample.

Experiment: A planned activity whose results yield a set of data.

Parameter: A numerical value summarizing all the data of an entire population.

Statistic: A numerical value summarizing the sample data.

Example: A college dean is interested in learning about the average age of faculty.
Identify the basic terms in this situation.

The population is the age of all faculty members at the college.

A sample is any subset of that population. For example, we might select 10 faculty
members and determine their age.
The variable is the “age” of each faculty member.
One data would be the age of a specific faculty member.
The data would be the set of values in the sample.
The experiment would be the method used to select the ages forming the sample and
determining the actual age of each faculty member in the sample.
The parameter of interest is the “average” age of all faculty at the college.
The statistic is the “average” age for all faculty in the sample.
Difference between Samples and Populations
▪ Statisticians gather data from a sample. They use this information to make inferences about the
population that the sample represents. Thus, a population is a whole, and a sample is a fraction or
segment of that whole.

▪ We will study samples in order to be able to describe populations. Our hospital may study a small,
representative group of X-ray records rather than examining each record for the last 50 years. The Gallup
Poll may interview a sample of only 2,500 adult Americans in order to predict the opinion of all adults
living in the United States.

▪ Studying samples is easier than studying the whole population; it costs less and takes less time. Often,
testing an airplane part for strength destroys the part; thus, testing fewer parts is desirable. Sometimes
testing involves human risk; thus, use of sampling reduces that risk to an acceptable level.
Presenting Data in Tables
and Charts
Organizing Numerical Data

Numerical Data

Frequency Distributions
Ordered
and
Array
Cumulative Distributions

Tables Histograms Polygons Ogive

Organizing Numerical Data:
Ordered Array
▪ An ordered array is a sequence of data, in rank order, from the smallest value to the largest
value.
▪ Shows range (minimum value to maximum value)
▪ May help identify outliers (unusual observations)

Day Shift
22 17 25 42 18 32

Weight of sample 38 19 20 27 21 22
of daily 16 17 20 18 19 18
production in kg
Night Shift
18 23 19 32 33 41
18 28 19 20 21 45
Organizing Numerical Data:
Ordered Array
▪ An ordered array is a sequence of data, in rank order, from the smallest value to the largest
value.
▪ Shows range (minimum value to maximum value)
▪ May help identify outliers (unusual observations)

Day Shift
16 17 17 18 18 18

Weight of sample 19 19 20 20 21 22
of daily production 22 25 27 32 38 42
in kg
Night Shift
18 18 19 19 20 21
23 28 32 33 41 45
Organizing Numerical Data:
Ordered Array
• Advantages:

• We can quickly notice the lowest and highest values in the data.
• We can easily divide the data into sections.
• We can see whether any values appear more than once in the array
• We can observe the distance between succeeding values in the data.

Disadvantage:

• It is a cumbersome form for displaying large quantities of data.

• We need to compress the information and still be able to use it for
interpretation and decision making
Organizing Numerical Data:
Frequency Distribution
▪The frequency distribution is a summary table in which the data are arranged into numerically ordered
classes.

▪You must give attention to selecting the appropriate number of class groupings for the table, determining a
suitable width of a class grouping, and establishing the boundaries of each class grouping to avoid
overlapping.
Organizing Numerical Data:
Frequency Distribution
Relative Frequency Distribution
• We can also express the frequency of each value as a fraction or a percentage of the total number of
observations.
• A relative frequency distribution presents frequencies in terms of fractions or percentages.
Guidelines for Selecting Width of Classes

Largest Data Value − Smallest Data Value

• Approximate Class Width =
Number of Classes
Organizing Numerical Data:
Frequency Distribution Example

Example: A manufacturer of insulation randomly selects 20 sample of iron rod and

records the temperature.

17, 13, 12, 24, 24, 21, 27, 26, 27, 35, 30, 32, 43, 41, 38, 44, 43, 58, 46, 53
Organizing Numerical Data:
Frequency Distribution Example
• Data in Ordered Array:
• 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Graphing Numerical Data:
The Histogram
• Data in Ordered Array:
• 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Histogram

7 6
Frequency 6 5 No Gaps
5 4 Between
4 3
3
Bars
2
2
1 0 0
0
5 15 25 35 45 55 More

Class Boundaries
Class Midpoints
Graphing Numerical Data:
The Frequency Polygon
• Data in Ordered Array:
• 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Frequency

7
6
5
4
3
2
1
0
5 15 25 35 45 55 More

Class Midpoints
Tabulating Numerical Data:
Cumulative Frequency
▪ Shows the number of items with values less than or equal to the
upper limit of each class

▪ Shows the proportion of items with values less than or equal to the
upper limit of each class.

▪ Shows the percentage of items with values less than or equal to the
upper limit of each class.
Tabulating Numerical Data:
Cumulative Frequency
• Data in Ordered Array:
• 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Graphing Numerical Data:
The Ogive (Cumulative % Polygon)
• A cumulative frequency distribution enables us to see how many
observations lie above or below certain values, rather than merely
recording the number of items within intervals.
Graphing Numerical Data:
The Ogive (Cumulative % Polygon)
• Data in Ordered Array:
• 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Ogive

100

80
60
40
20

0
10 20 30 40 50 60

Class Boundaries (Not Midpoints)

Stem and Leaf Plot
A Stem and Leaf Plot is a special table where each data value is split into a "stem" (the first digit
or digits) and a "leaf" (usually the last digit).
Measures of Central Tendency and
Dispersion in Frequency Distributions
Central Tendency
• Central tendency is the middle point of a distribution. Measures of
central tendency are also called measures of location.

Mean: 6.97
Central Tendency
In following figure , the central location of curve B lies to the right of
those of curve A and curve C. Notice that the central location of curve
A is equal to that of curve C.

COMPARISON OF CENTRAL LOCATION OF THREE CURVES

Excel Module 1 Test results
Dispersion
• Dispersion is the spread of the data in a distribution, that is, the
extent to which the observations are scattered. Notice that curve A in
Figure has a wider spread, or dispersion, than curve B.

COMPARISON OF DISPERSION OF TWO CURVES

• Skewness Curves representing the data points in the data set may be either symmetrical or skewed. Symmetrical curves,
like the one in following Figure, are such that a vertical line drawn from the center of the curve to the horizontal axis
divides the area of the curve into two equal parts. Each part is the mirror image of the other.

COMPARISON OF CENTRAL LOCATION OF THREE CURVES

• Kurtosis When we measure the kurtosis of a distribution, we are measuring its peakedness.
• For example, curves A and B differ only in that one is more peaked than the other. They have the same central location
and dispersion, and both are symmetrical. Statisticians say that the two curves have different degrees of kurtosis

TWO CURVES WITH THE SAME CENTRAL LOCATION BUT DIFFERENT KURTOSIS
Ungrouped vs Grouped Data
• Ungrouped data
• Ungrouped data which is also known as raw data is data that has not been placed in any
group or category after collection.
• Data is categorized in numbers or characteristics therefore, the data which has not been
put in any of the categories is ungrouped.
• The number of individuals residing in that area is ungrouped data or raw information
because nothing has been categorized.
• We can therefore conclude that ungrouped data is data used to show information on an
individual member of a sample or population.
Ungrouped vs Grouped Data
• Grouped data

• Grouped data is the type of data which is classified into groups after collection.

• The number of individuals residing in that area is ungrouped data or raw information

because nothing has been categorized. But if it is categorized as no. of man, no. if

women and no. of children, then it is considered as grouped data

• The raw data is categorized into various groups and a table is created.

• The primary purpose of the table is to show the data points occurring in each group.
A MEASURE OF CENTRAL TENDENCY
1. The Arithmetic Mean
2. The Weighted Mean
3. The Median
4. The Mode
A MEASURE OF CENTRAL TENDENCY
1. The Arithmetic Mean
2. The Weighted Mean
3. The Median
4. The Mode
Calculating the Mean from Ungrouped Data
A MEASURE OF CENTRAL TENDENCY
The Arithmetic Mean
• Suppose the monthly income (in Rs) of six families is given as:
1600, 1500, 1400, 1525, 1625, 1630.

• The mean family income is obtained by adding up the incomes and

dividing by the number of families.
1600 + 1500 + 1400 + 1525 + 1625 + 1630
𝐴𝑟𝑖𝑡ℎ𝑚𝑒𝑡ℎ𝑖𝑐 𝑀𝑒𝑎𝑛 =
6

𝐴𝑟𝑖𝑡ℎ𝑚𝑒𝑡ℎ𝑖𝑐 𝑀𝑒𝑎𝑛 = 1546

A MEASURE OF CENTRAL TENDENCY The
Arithmetic Mean
• A sample of a population consists of n observations (a lowercase 𝑛) with a mean
of 𝑥ҧ .

• Remember that the measures we compute for a sample are called statistics.

• The notation is different when we are computing measures for the entire
population, that is, for the group containing every element we are describing.

• The mean of a population is symbolized by 𝜇, which is the Greek letter 𝑚𝑢.

• The number of elements in a population is denoted by the capital italic letter 𝑁

Calculating the Mean from Ungrouped Data

𝜇 = 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝐴𝑟𝑖𝑡ℎ𝑚𝑒𝑡𝑖𝑐 𝑀𝑒𝑎𝑛

𝑥ҧ = 𝑆𝑎𝑚𝑝𝑙𝑒 𝐴𝑟𝑖𝑡ℎ𝑚𝑒𝑡𝑖𝑐 𝑀𝑒𝑎𝑛

Example
• Following Table shows the score obtained by seven different students
taking an online preparatory quiz.
Table: Quiz Marks
Students 1 2 3 4 5 6 7
Marks Obtained 9 7 7 6 4 4 2

• We Calculate the Mean of this sample of seven students as follows:

σ𝑥
• 𝑥ҧ =
𝑛

9+7+7+6+4+4+2
• 𝑥ҧ = = 5.6 ← 𝑆𝑎𝑚𝑝𝑙𝑒 𝑚𝑒𝑎𝑛
7
Calculating the Mean from Grouped Data
• A frequency distribution consists of data that are grouped by classes.
• Each value of an observation falls somewhere in one of the classes.
• Unlike the last example, we do not know the separate values of every observation.

• Steps to be followed:
• Calculate the midpoint of each class. To make midpoints come out in whole cents, we round
up. For example, mid point for first class becomes 25.00, rather than 24.995.
• Then we multiply each midpoint by the frequency of observations in that class,
• Sum all these results, and divide the sum by the total number of observations in the sample.
Calculating the Mean from Grouped Data
• Formula
σ 𝑓×𝑥
• 𝑥ҧ =
𝑛
• Where
• 𝑥ҧ = sample mean
• σ = symbol meaning “the sum of”
• 𝑓 = frequency (number of observations) in each class
• 𝑥 = midpoint for each class in the sample
• n = number of observations in the sample
Example
Calculating the Mean from Grouped Data-
Coding
• Using a technique called coding, we eliminate the problem of large or inconvenient midpoints.
• Instead of using the actual midpoints to perform our calculations, we can assign small-value consecutive
integers (whole numbers) called codes to each of the midpoints.
• The integer zero can be assigned anywhere, but to keep the integers small, we will assign zero to the midpoint
in the middle (or the one nearest to the middle) of the frequency distribution.
• Then we can assign negative integers to values smaller than that midpoint and positive integers to those
larger, as follows:

Class 1-5 6-10 11-15 16-20 21-25 26-30 31-35 36-40 41-45
Code (u) -4 -3 -2 -1 0 1 2 3 4

𝑥0
Calculating the Mean from Grouped Data-
Coding
• Formula
Symbolically, statisticians use 𝑥0 to
σ 𝑢∗𝑓
• 𝑥ҧ = 𝑥0 + 𝑤 represent the midpoint that is assigned
𝑛 the code 0, and u for the coded
midpoint.
• Where
• x = mean of sample
• 𝑥0 = value of the midpoint assigned the code 0
• w = numerical width of the class interval
• u = code assigned to each class
• f = frequency or number of observations in each class
• n = total number of observations in the sample
Calculating the Mean from Grouped Data-
Coding Example
• Following table represents how to code the midpoints and find the sample mean of annual rainfall (in inches)
over 20 years in Kochi, Kerala

Annual Rainfall (Class) Frequency Lower Class Boundary

0-7 2
8-15 6
16-23 3
24-31 5
32-39 2
40-47 2

*W =8
• Y = mx + c

• 𝑌ത = 𝑚𝑥ҧ + 𝑐
A MEASURE OF CENTRAL TENDENCY
1. The Arithmetic Mean
2. The Weighted Mean
3. The Median
4. The Mode
The Weighted Mean
o The arithmetic mean, as discussed earlier, gives equal important (or weight) to each observation
in the data set.
o However, there are situations in which value of individual observations in the data set is not of
equal importance.
o If values occur with different frequencies, then computing A.M. of values (as opposed to the
A.M. of observations) may not be truly representative of the data set characteristic and thus may
be misleading.
o Under these circumstances, we may attach to each observation value a ‘weight’ 𝑤1 , 𝑤2 … … 𝑤𝑁 as
an indicator of their importance perhap because of size or importance and compute a weighted
mean or average denoted by 𝑥ҧ𝑤 as

σ 𝑤×𝑥
𝑥ҧ𝑤 = σ𝑤
𝑥ҧ𝑤 = symbol for the weighted mean
w = weight assigned to each observation
When to use weighted arithmetic mean

• (i) when the importance of all the numerical values in the given data
set is not equal.

• (ii) when the frequencies of various classes are widely varying.

• (iii) where there is a change either in the proportion of numerical

values or in the proportion of their frequencies.
Example
• A quiz was held to decide the award of a scholarship. The weights of various subjects were different. The
marks obtained by 3 candidates (out of 100 in each subject) are given below:

Subjects Weights Students

Ron Harry Hermione
Microeconomics 4 60 57 62
Financial Accounting 3 62 61 67
Business Statistics 2 55 53 60
Business Ethics 1 67 77 49

• Calculate the weighted A.M. to award the scholarship

Example
• The owner of a general store was interested in knowing the mean contribution (sales price minus variable cost)
of his stock of 5 items. The data is given below:

Product Contribution Quantity Sold

1 6 160
2 11 60
3 8 260
4 4 460
5 14 110
A MEASURE OF CENTRAL TENDENCY
1. The Arithmetic Mean
2. The Weighted Mean
3. The Median
4. The Mode
The Median
• Median may be defined as the middle value in the data set when its elements are
arranged in a sequential order, that is, in either ascending or decending order of
magnitude.

• It is called a middle value in an ordered sequence of data in the sense that half of
the observations are smaller and half are larger than this value.

• The median is thus a measure of the location or centrality of the observations.

• The median can be calculated for both ungrouped and grouped data sets.
The Median – for ungrouped data
In this case the data is arranged in either ascending or descending order of
magnitude
Median Value
If the number of observations (n) is an odd number (𝑛+1)
𝑀𝑒𝑑 = 𝑡ℎ obesrvation
2

If the number of observations (n) is an even number 𝑛 𝑛

𝑡ℎ + + 1 𝑡ℎ
𝑀𝑒𝑑 = 2 2
2
The Median (ungrouped data) – Examples

1. Calculate the median of the following data that relates to the service time
(in minutes) per customer for 7 customers at a railway reservation
counter:

3.5, 4.5, 3, 3.8, 5.0, 5.5, 4

2. Calculate the median of the following data that relates to the number of
patients examined per hour in the outpatient word (OPD) in a hospital:

10, 12, 15, 20, 13, 24, 17, 18

The Median – for grouped data
• To find the median value for grouped data, first identify the class interval which
contains the median value or (n/2)th observation of the data set.
• To identify such class interval, find the cumulative frequency of each class until the
class for which the cumulative frequency is equal to or greater than the value of
(n/2)th observation.
• The value of the median within that class is found by using interpolation. That is, it
is assumed that the observation values are evenly spaced over the entire class
interval.
• The following formula is used to determine the median of grouped data:

𝑛
−𝑐𝑓
• 𝑀𝑒𝑑 = 𝑙 + 2
×ℎ
𝑓
The Median – for grouped data
𝒏
−𝒄𝒇
• 𝑴𝒆𝒅 = 𝒍 + 𝟐
×𝒉
𝒇
• where
• l = lower class limit (or boundary) of the median class interval.
• c.f. = cumulative frequency of the class prior to the median class
interval, that is, the sum of all the class frequencies upto, but not
including, the median class interval
• f = frequency of the median class
• h = width of the median class interval
• n = total number of observations in the distribution.
The Median (grouped data) – Example
• A survey was conducted to determine the age (in years) of 120 automobiles. The
result of such a survey is as follows:

Age of Auto No of Auto Lcb cf

0-4 13
4-8 29
8-12 48
12-16 22
16-20 8
The Median (grouped data) – Example

Age of Auto No of Auto Lcb cf

0-4 13 0 13
4-8 29 4 42 Median=8+(((120/2)-42)/48)*4
8-12 48 8 90
12-16 22 12 112
16-20 8 16 120
A MEASURE OF CENTRAL TENDENCY
1. The Arithmetic Mean
2. The Weighted Mean
3. The Median
4. The Mode
The Mode
• The mode is the value that is repeated most often in the data set.
Mode for the ungrouped data
Example
DISPERSION
Suppose over the six-year period the net profits (in percentage) of two firms is as follows:
Firm 1 : 5.8, 5.5, 5.0, 5.7, 5.1, 5.4 5.41
Firm 2 : 5.6, 3.2, 3.3, 4.3, 4.0, 12.1 5.41
USEFUL MEASURES OF DISPERSION

• Range
• Interquartile Range
• Variance and Standard deviation
Range
• The range is the most simple measure of dispersion and is based on the
location of the largest and the smallest values in the data.

• Thus the range is defined to be the difference between the largest and lowest
observed values in a data set.

• Range (R) = Highest value of an observation – Lowest value of an observation

• =H–L
Range – Example (Ungrouped Data)
• The following are the sales figures of a firm for the last 12 months

Months 1 2 3 4 5 6 7 8 9 10 11 12
Sales (Rs ’000) 80 82 82 84 84 86 86 88 88 90 90 92

• Calculate the range of the given data.

Range – Example (Grouped Data)
• The following data show the waiting time of telephone calls to be matured:

Waiting Time Frequency

(Second)
10-25 6
26-50 10
51-75 8
76-100 4
101-125 4
Coefficient of Range
• This is a relative measure of dispersion and is based on the
value of the range. It is also called range coefficient of
dispersion.

• Given by (H – L) / (H + L)
Example

• The following are the wages of 8 workers in a factory. Find

the range and coefficient of range. Wages are in dollars:
1400, 1450, 1520, 1380, 1485, 1495, 1575, 1440.
Range

Advantage Disadvantage
• (i) It is independent of the measure of • (i) The calculation of range is based on only
central tendency and easy to calculate two values—largest and smallest in the data set
and fail to take account of any other
and understand. observations.
• (ii) It is quite useful in cases where the • (ii) It is largely influenced by two extreme
purpose is only to find out the extent values and completely independent of the
of extreme variation, such as industrial other values. For example, range of two data
quality control, temperature, rainfall, sets {1, 2, 3, 7, 12} and {1, 1, 1, 12, 12} is 11,
but the two data sets differ in terms of overall
and so on. dispersion of values.
• (iii) It does not describe the variation among
values in the data between two extremes.
Quartiles
It is often desirable to divide data into four parts, with each part
containing approximately one-fourth, or 25% of the observations.

Q1 = First quartile, or 25th percentile (also the median)

Q2 = second quartile, or 50th percentile (also the median)
Q3 = Third quartile, or 75th percentile
PERCENTILE

• The pth percentile is a value such that at least p percent of the

observations are less than or equal to this value and at least (100 - p)
percent of the observations are greater than or equal to this value.
PERCENTILE calculation steps
Step 1. Arrange the data in ascending order (smallest value to largest value).

Step 2. Compute an index i = (p/100) * n (n = sample size)

Step 3.

(a) If i is not an integer, round up. The next integer greater than i denotes the
position of the pth percentile.

(b) If i is an integer, the pth percentile is the average of the values in positions i and
i + 1.
Example: Determine 85 th percentile
3310, 3355, 3450, 3480, 3480, 3490, 3520, 3540, 3550, 3650, 3730, 3925
Example
3310, 3355, 3450, 3480, 3480, 3490, 3520, 3540, 3550, 3650, 3730, 3925

85 th percentile = (85/ 100) * 12 = 10.2

Because i is not an integer, round up. The position of the 85th percentile is the
next integer greater than 10.2, the 11th position.

we see that the 85th percentile is the data value in the 11th position or 3730.
Interquartile Range
• The limitations or disadvantages of the range can partially be overcome by using another measure
of variation which measures the spread over the middle half of the values in the data set so as to
minimize the influence of outliers (extreme values) in the calculation of range.

• Since a large number of values in the data set lie in the central part of the frequency distribution,
therefore it is necessary to study the Interquartile Range (also called midspread).

• To compute this value, the entire data set is divided into four parts each of which contains 25 per
cent of the observed values. The quartiles are the highest values in each of these four parts.

• The interquartile range is a measure of dispersion or spread of values in the data set between the
third quartile, Q3 and the first quartile, Q1 .

• In other words, the interquartile range or deviation (IQR) is the range for the middle 50 per cent
of the data.
Interquartile range (IQR) = Q3 – Q1
Interquartile Range
• The concept of IQR is shown in Fig.
Example
(i) Find the interquartile range of the given data
5, 8, 15, 26, 10, 18, 3, 12, 6, 14, 11

(ii) Find the interquartile range of the given data

11, 31, 21, 19, 8, 54, 35, 26, 29, 31, 35, 54
Quartiles of grouped data
𝑁
4
−𝑓𝑐
Lower quartile 𝑄1 = 𝑙 + ∗𝑤
𝑓𝑞

3𝑁
−𝑓𝑐
4
Upper quartile 𝑄3 = 𝑙 + ∗𝑤
𝑓𝑞

𝑙 𝑖𝑠 𝑡ℎ𝑒 𝑙𝑜𝑤𝑒𝑟 𝑐𝑙𝑎𝑠𝑠 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒 𝑐𝑙𝑎𝑠𝑠

N is the total freq of distribution,
Fc is cumulative distribution before the quartile class
Fq is the frequeny of the quartile class
W is the class width
Example: Step 1
Marks Freq. C.F LCB
21-30 12
31-40 21
41-50 34
51-60 20
61-70 6
71-80 4
81-90 3
Example: Step 1
Marks Freq. C.F LCB
21-30 12 12 20.5
31-40 21 33 30.5
41-50 34 67 40.5
51-60 20 87 50.5
61-70 6 93 60.5
71-80 4 97 70.5
81-90 3 100 80.5
Example: Step 1
Value Fq Fc L
3𝑁
− 𝑓𝑐
𝑄3 = 𝑙 + 4 ∗𝑤
21-30 12 12 20.5 𝑓𝑞
31-40 21 33 30.5
41-50 34 67 40.5 3𝑁
𝑁𝑜𝑤, = 75, hence l = 50.5
4
51-60 20 87 50.5
𝑓𝑐 = 67 ; 𝑓𝑞 = 20 ; 𝑤 = 10
61-70 6 93 60.5
71-80 4 97 70.5
81-90 3 100 80.5 𝑄3 = 54.5
Example: Step 1
Value Fq Fc L
1𝑁
− 𝑓𝑐
𝑄1 = 𝑙 + 4 ∗𝑤
21-30 12 12 20.5 𝑓𝑞
31-40 21 33 30.5
41-50 34 67 40.5 𝑁
𝑁𝑜𝑤, = 25, hence l = 30.5
4
51-60 20 87 50.5
𝑓𝑐 = 12 ; 𝑓𝑞 = 21 ; 𝑤 = 10
61-70 6 93 60.5
71-80 4 97 70.5
81-90 3 100 80.5 𝑄1 = 36.69
Average Deviation Measures
• Two of these measures are important to our study of statistics: the variance
and the standard deviation. Both of these tell us an average distance of
any observation in the data set from the mean of the distribution.

• In statistics, the standard deviation is the usual way of measuring

distance from the mean or median (technically it measures dispersion or
variance, which is a complicated way of saying distance).
Average Deviation Measures - Population
Variance and Standard deviation

Variance Standard Deviation

• Every population has a variance, which is • The population standard deviation, or 𝜎,
symbolized by 𝜎 2 (sigma squared). is simply the square root of the
For • The formula for calculating the variance population variance.
ungroup is • Because the variance is the average of the
data
σ 𝑥−𝜇 2 squared distances of the observations
• 𝜎2 = from the mean, the standard deviation
𝑁
is the square root of the average of the
squared distances of the observations
from the mean.
For
σ 𝑥−𝜇 2
• 𝜎 = 𝜎2 = ungroup
𝑁
data

σ𝑥
𝜇= Population mean
𝑁
Example
Example
Example
Example
(Sigma2/N) – (mu)2
Standard Score
𝒙−𝝁
Population standard score =
𝝈

Suppose we observe a vial of compound that is 0.108 percent impure. Because our population has a mean
of 0.166 and a standard deviation of 0.058, an observation of 0.108 would have a standard score of – 1:

0.108 − 0.166
= −1
0.058

An observed impurity of 0.282 percent would have a standard score of +2

The standard score indicates that an impurity of 0.282 percent deviates from the mean by 2(0.058) = 0.116 unit,
which is equal to +2 in terms of the number of standard deviations away from the mean.
Example
• The wholesale prices of a commodity for seven consecutive days in a month is as follows:
• Days : 1 2 3 4 5 6 7
• Commodity price/quintal : 240 260 270 245 255 286 264
• Calculate the variance and standard deviation.
Average Deviation Measures - Sample
Variance and Standard deviation

Variance Standard Deviation

For For
2 σ(x−𝑥)ҧ 2 ungroup σ 𝑥−𝑥ҧ 2 ungroup
•𝑠 = data • 𝑠= 𝑠2 =
𝑛−1 data
𝑛−1

σ𝑥
𝑥ҧ = Sample mean
𝑛
Example
• Calculate the sample variance and standard deviation for the
following data
Observation
863
903
957
1,041
1,138
1,204
1,354
Average Deviation Measures - Population
Variance and Standard deviation

Variance Standard Deviation

•
For
σ 𝑓 𝑥−𝜇 2 σ 𝑓 𝑥−𝜇 2
For
• 𝜎2 = group
data • 𝜎 = 𝜎2 = group
𝑁 𝑁 data

σ 𝑓×𝑥
𝜇=
𝑁
Average Deviation Measures - Population
Variance and Standard deviation (Coding)

Standard Deviation
• For
• 𝜎 = 𝜎2 = 𝑤 × group
data
2 σ 𝑓×𝑢
σ 𝑓×𝑢2 σ 𝑓×𝑢 𝜇 = 𝑥0 + 𝑤
𝑁 − 𝑁
𝑁
Where
𝜇 = mean of sample
𝑥0 = value of the midpoint assigned the code 0
w = numerical width of the class interval
u = code assigned to each class
f = frequency or number of observations in each class
n = total number of observations in the sample
Example
• Calculate the mean, variance and standard deviation for the given
data
Value Frequency
30-39 2
40-49 12
50-59 22
60-69 20
70-79 14
80-89 4
90-99 1
Summary
Type of Data Mean Standard Deviation

σ𝑥 σ 𝑥−𝜇 2
Population 𝜇= 𝜎= 𝜎2 =
𝑁 𝑁
Ungrouped
σ𝑥 σ 𝑥 − 𝑥ҧ 2
Sample 𝑥ҧ = 𝑠= 𝑠2 =
𝑛 𝑛−1

σ 𝑓×𝑥 σ𝑓 𝑥 − 𝜇 2
𝜇= 𝜎= 𝜎2 =
𝑁 𝑁
Grouped Population
2
σ 𝑓×𝑢 σ 𝑓 × 𝑢2 σ 𝑓×𝑢
𝜇 = 𝑥0 + 𝑤 𝜎= 2
𝜎 =𝑤× −
𝑁 𝑁 𝑁
Coefficient of Variation
• Standard deviation is an absolute measure of variation and expresses variation in the same unit of measurement
as the arithmetic mean or the original data.
• A relative measure called the coefficient of variation (CV), developed by Karl Pearson is very useful measure
• for (i) comparing two or more data sets expressed in different units of measurement
• (ii) comparing data sets that are in same unit of measurement but the mean values of data sets in a comparable
field are widely dissimilar

𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝜎
𝐶𝑜𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑣𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛 𝐶𝑉 = × 100 = × 100
𝑀𝑒𝑎𝑛 𝜇
Example
Each day, laboratory technician A completes on average 40 analyses with a standard
deviation of 5. Technician B completes on average 160 analyses per day with a
standard deviation 15. Which employee shows the less variability?

𝜎
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑣𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛 𝐶𝑉 = × 100
𝜇

5
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑣𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛 𝐶𝑉 = 40 × 100 = 12.5 % for Technician A

15
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑣𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛 𝐶𝑉 = 160 × 100 = 9.4 % for Technician B

So, we find that Technician B, who has more absolute variation in output than Technician A, has less relative
variation because the mean output for B is much greater than A

Math-101-Statistics
No ratings yet
Math-101-Statistics
100 pages
Statistics
No ratings yet
Statistics
89 pages
1 Biostatistics LECTURE 1
100% (1)
1 Biostatistics LECTURE 1
64 pages
MAT118 Elementary Statistics 3rdEd FA22v
No ratings yet
MAT118 Elementary Statistics 3rdEd FA22v
286 pages
Introduction To Biostatistics: Dr. M. H. Rahbar
No ratings yet
Introduction To Biostatistics: Dr. M. H. Rahbar
35 pages
Lecture 1 Introduction and Vocabulary SV
No ratings yet
Lecture 1 Introduction and Vocabulary SV
14 pages
Quadratic Equations Imp Questions (March - 2025)
No ratings yet
Quadratic Equations Imp Questions (March - 2025)
8 pages
Q4-Math 7-Week 1-Statistics
No ratings yet
Q4-Math 7-Week 1-Statistics
55 pages
Statistics - Shikha Agrawal
No ratings yet
Statistics - Shikha Agrawal
33 pages
Statistical Analysis With Software Application-ppt_5ff616054a20ee1e28e5a36722f6fc61
No ratings yet
Statistical Analysis With Software Application-ppt_5ff616054a20ee1e28e5a36722f6fc61
57 pages
(Math 01) Basic Statistics
No ratings yet
(Math 01) Basic Statistics
9 pages
6.-MMW-Introduction-to-Statistics
No ratings yet
6.-MMW-Introduction-to-Statistics
55 pages
Preliminary Concepts of Statistics
No ratings yet
Preliminary Concepts of Statistics
70 pages
Statistical Analysis With Software Application
100% (1)
Statistical Analysis With Software Application
126 pages
Statistical Method
No ratings yet
Statistical Method
136 pages
Lect.1
No ratings yet
Lect.1
47 pages
Chapter 1: Introduction To Statistics: 1.1 An Overview of Statistics
No ratings yet
Chapter 1: Introduction To Statistics: 1.1 An Overview of Statistics
5 pages
Statistical Analysis With Software Application PDF
No ratings yet
Statistical Analysis With Software Application PDF
145 pages
Pre_Week 1N
No ratings yet
Pre_Week 1N
27 pages
UP Statistics Lecture
100% (1)
UP Statistics Lecture
102 pages
MS 14L1 Introduction To Statistics
No ratings yet
MS 14L1 Introduction To Statistics
30 pages
Preliminary Concepts of Statistics Cognate 1
No ratings yet
Preliminary Concepts of Statistics Cognate 1
70 pages
1 Introduction To Statistics
No ratings yet
1 Introduction To Statistics
14 pages
(ArsFabulae Win64 Shipping - Exe) ObjectsDump
No ratings yet
(ArsFabulae Win64 Shipping - Exe) ObjectsDump
3,297 pages
BSEM-26_CHAPTER-1-1-10 (1)
No ratings yet
BSEM-26_CHAPTER-1-1-10 (1)
10 pages
MATRIX GEC 05 REVIWER
No ratings yet
MATRIX GEC 05 REVIWER
7 pages
Educ3063 Notes
No ratings yet
Educ3063 Notes
52 pages
Week#1 P&S (Introd. to Stats.)
No ratings yet
Week#1 P&S (Introd. to Stats.)
26 pages
Engineering Data Analysis
No ratings yet
Engineering Data Analysis
64 pages
1 Introduction
No ratings yet
1 Introduction
4 pages
Intro123243ewqs1
No ratings yet
Intro123243ewqs1
37 pages
Chapter 1: Introduction To Statistics
No ratings yet
Chapter 1: Introduction To Statistics
40 pages
Statistics and Probability A Brief History of Statistics
No ratings yet
Statistics and Probability A Brief History of Statistics
42 pages
Module One Two One
No ratings yet
Module One Two One
32 pages
Statistics For Management I
No ratings yet
Statistics For Management I
82 pages
Math Module 1 Module 2
No ratings yet
Math Module 1 Module 2
7 pages
Chapter 1
No ratings yet
Chapter 1
6 pages
CHAPTER WISE QUESTION BANK MATH CLASS VIII
No ratings yet
CHAPTER WISE QUESTION BANK MATH CLASS VIII
47 pages
Topic 1 ELEMENTARY STATISTICS
No ratings yet
Topic 1 ELEMENTARY STATISTICS
29 pages
Research MI Colege Nastha
No ratings yet
Research MI Colege Nastha
16 pages
STATISTICS Powrepoint 2
No ratings yet
STATISTICS Powrepoint 2
82 pages
Lecture Note I
No ratings yet
Lecture Note I
19 pages
Basic Concept in Statistics-Biostat
No ratings yet
Basic Concept in Statistics-Biostat
29 pages
G7 Math Frequency
No ratings yet
G7 Math Frequency
11 pages
Chapter 1: Statistics: Scatterplot
No ratings yet
Chapter 1: Statistics: Scatterplot
30 pages
Statistics 8
No ratings yet
Statistics 8
33 pages
Nature of Statistics
100% (1)
Nature of Statistics
7 pages
1-Nature-of-Statistics
No ratings yet
1-Nature-of-Statistics
33 pages
Introduction To Statistics: There Are Three Kinds of Lies: Lies, Damned Lies, and Statistics." (B.Disraeli)
No ratings yet
Introduction To Statistics: There Are Three Kinds of Lies: Lies, Damned Lies, and Statistics." (B.Disraeli)
26 pages
Nature of Statistics
No ratings yet
Nature of Statistics
7 pages
Rosalie Act. 2.0
No ratings yet
Rosalie Act. 2.0
9 pages
1 Chapt 1 Part 1
No ratings yet
1 Chapt 1 Part 1
41 pages
MODULE 1 STATISTICS AND DATA ANALYSIS Final
No ratings yet
MODULE 1 STATISTICS AND DATA ANALYSIS Final
9 pages
Intro - Stat
No ratings yet
Intro - Stat
29 pages
Lecture # 1 Introduction and Scope of Statistics
No ratings yet
Lecture # 1 Introduction and Scope of Statistics
34 pages
Milky Way 1
No ratings yet
Milky Way 1
13 pages
File Path Traversal
No ratings yet
File Path Traversal
20 pages
Chapter One: Definition of Statistics: The Word "Statistics" Has Different Meanings To Different Person's .When
No ratings yet
Chapter One: Definition of Statistics: The Word "Statistics" Has Different Meanings To Different Person's .When
30 pages
Lec 1 - Data, Tables and Graphs
No ratings yet
Lec 1 - Data, Tables and Graphs
18 pages
Lesson 1 Basic Concepts of Statistics
No ratings yet
Lesson 1 Basic Concepts of Statistics
9 pages
9700 Topic Questions
No ratings yet
9700 Topic Questions
137 pages
1lesson 1 Basic Concepts of Statistics With Answers
No ratings yet
1lesson 1 Basic Concepts of Statistics With Answers
9 pages
Chapter 8 Nanotechnology and Superconductivity
No ratings yet
Chapter 8 Nanotechnology and Superconductivity
14 pages
Math 7 - Q4 - W1 - Statistics and Simple Statistical Instruments - (Montenegro, C.)
No ratings yet
Math 7 - Q4 - W1 - Statistics and Simple Statistical Instruments - (Montenegro, C.)
11 pages
Airline Crew Augmentation: Decades of Improvements From Sabre
No ratings yet
Airline Crew Augmentation: Decades of Improvements From Sabre
26 pages
7 Time Series
No ratings yet
7 Time Series
50 pages
DETAILED LESSON-WPS Office
No ratings yet
DETAILED LESSON-WPS Office
8 pages
Introduction To Numerical Methods in Chemical Engineering - P. Ahuja
100% (2)
Introduction To Numerical Methods in Chemical Engineering - P. Ahuja
99 pages
2008 Monaghan Plausibility of GPS Guided Planes Into Towers On 911
No ratings yet
2008 Monaghan Plausibility of GPS Guided Planes Into Towers On 911
11 pages
Human Resource Management: Recruitment & Selection Rohit Kumar
100% (1)
Human Resource Management: Recruitment & Selection Rohit Kumar
35 pages
Lesson 1 Stats
No ratings yet
Lesson 1 Stats
5 pages
EM Soln
No ratings yet
EM Soln
8 pages
DX300LC Specsheet
No ratings yet
DX300LC Specsheet
35 pages
Applicaton of Derviative
No ratings yet
Applicaton of Derviative
58 pages
07-Huawei EDesigner & SCT Tools Pre-Sales Training V1.7-Qian Wei
No ratings yet
07-Huawei EDesigner & SCT Tools Pre-Sales Training V1.7-Qian Wei
42 pages
CHEM 212: Phase Equilibria
No ratings yet
CHEM 212: Phase Equilibria
68 pages
17th International Conference on Applications of Graph Theory in Wireless Ad hoc Networks and Sensor Networks (GRAPH-HOC 2025)
No ratings yet
17th International Conference on Applications of Graph Theory in Wireless Ad hoc Networks and Sensor Networks (GRAPH-HOC 2025)
2 pages
Assignment On Chapter 11
No ratings yet
Assignment On Chapter 11
3 pages
Geothermal Energy Extraction From Decommissioned Petroleum Wells
No ratings yet
Geothermal Energy Extraction From Decommissioned Petroleum Wells
3 pages
Troubles of Lakshmi
No ratings yet
Troubles of Lakshmi
7 pages
American Alloy ASTM Specs
No ratings yet
American Alloy ASTM Specs
24 pages
Handheld Ultrasonic Flow Meter
No ratings yet
Handheld Ultrasonic Flow Meter
2 pages
GE Fanuc Automation: Computer Numerical Control Products
No ratings yet
GE Fanuc Automation: Computer Numerical Control Products
170 pages
Separator Sizing Spreadsheet Main Menu: File Separp1
100% (1)
Separator Sizing Spreadsheet Main Menu: File Separp1
23 pages
EXPERIMENT 4 HOOKEs LAW PDF
No ratings yet
EXPERIMENT 4 HOOKEs LAW PDF
6 pages
ILD217T Optocoupler, Phototransistor Output, Dual Channel, SOIC-8 Package
No ratings yet
ILD217T Optocoupler, Phototransistor Output, Dual Channel, SOIC-8 Package
4 pages
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
Profit Driven Business Analytics: A Practitioner's Guide to Transforming Big Data into Added Value
From Everand
Profit Driven Business Analytics: A Practitioner's Guide to Transforming Big Data into Added Value
Wouter Verbeke
No ratings yet

Module 1

Uploaded by

Module 1

Uploaded by

Module 1

• To construct and use data arrays

• To construct and use frequency distributions

• To graph frequency distributions with histograms, polygons, and ogives

• To use frequency distributions to make decisions

Two areas of statistics:

Descriptive Statistics: collection, presentation, and description of

Statistics uses Numerical Evidence to draw valid Conclusions

• The mean math SAT score was 492.

• The mean verbal SAT score was 475.

• Students in the Northeast scored higher in math but lower in verbal.

• 32% of the students scored above 610 on the verbal SAT.

Sample: A subset of the population.

Experiment: A planned activity whose results yield a set of data.

Parameter: A numerical value summarizing all the data of an entire population.

Statistic: A numerical value summarizing the sample data.

The population is the age of all faculty members at the college.

Tables Histograms Polygons Ogive

• It is a cumbersome form for displaying large quantities of data.

Largest Data Value − Smallest Data Value

Example: A manufacturer of insulation randomly selects 20 sample of iron rod and

Class Boundaries (Not Midpoints)

COMPARISON OF CENTRAL LOCATION OF THREE CURVES

COMPARISON OF DISPERSION OF TWO CURVES

COMPARISON OF CENTRAL LOCATION OF THREE CURVES

women and no. of children, then it is considered as grouped data

• The mean family income is obtained by adding up the incomes and

𝐴𝑟𝑖𝑡ℎ𝑚𝑒𝑡ℎ𝑖𝑐 𝑀𝑒𝑎𝑛 = 1546

• The mean of a population is symbolized by 𝜇, which is the Greek letter 𝑚𝑢.

• The number of elements in a population is denoted by the capital italic letter 𝑁

𝜇 = 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝐴𝑟𝑖𝑡ℎ𝑚𝑒𝑡𝑖𝑐 𝑀𝑒𝑎𝑛

𝑥ҧ = 𝑆𝑎𝑚𝑝𝑙𝑒 𝐴𝑟𝑖𝑡ℎ𝑚𝑒𝑡𝑖𝑐 𝑀𝑒𝑎𝑛

• We Calculate the Mean of this sample of seven students as follows:

Annual Rainfall (Class) Frequency Lower Class Boundary

• (ii) when the frequencies of various classes are widely varying.

• (iii) where there is a change either in the proportion of numerical

Subjects Weights Students

• Calculate the weighted A.M. to award the scholarship

Product Contribution Quantity Sold

• The median is thus a measure of the location or centrality of the observations.

If the number of observations (n) is an even number 𝑛 𝑛

3.5, 4.5, 3, 3.8, 5.0, 5.5, 4

10, 12, 15, 20, 13, 24, 17, 18

Age of Auto No of Auto Lcb cf

Age of Auto No of Auto Lcb cf

• Range (R) = Highest value of an observation – Lowest value of an observation

• Calculate the range of the given data.

Waiting Time Frequency

• The following are the wages of 8 workers in a factory. Find

Q1 = First quartile, or 25th percentile (also the median)

• The pth percentile is a value such that at least p percent of the

Step 2. Compute an index i = (p/100) * n (n = sample size)

85 th percentile = (85/ 100) * 12 = 10.2

(ii) Find the interquartile range of the given data

𝑙 𝑖𝑠 𝑡ℎ𝑒 𝑙𝑜𝑤𝑒𝑟 𝑐𝑙𝑎𝑠𝑠 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒 𝑐𝑙𝑎𝑠𝑠

• In statistics, the standard deviation is the usual way of measuring

Variance Standard Deviation

An observed impurity of 0.282 percent would have a standard score of +2

Variance Standard Deviation

Variance Standard Deviation

You might also like