Chapter 4
Chapter 4
4.1 INTRODUCTION
STATISTICS
Technique of Technique of
collecting analyzing
tabulating inferences
presenting
summarizing
information
to make conclusion
about population
to describe data
Population
Sample
DEFINITIONS EXPLANATIONS
Populations A population consists of all elements—individuals, items, or
objects—whose characteristics are being studied.
Sample A portion of the population selected for study is referred to as a
sample
Census (bancian) A survey that includes every member of the population.
Parameter A parameter of population is some quantity that relates to the
population, such as its mean or median.
EXAMPLE 1
DATA
DEFINITIONS EXPLANATIONS
Can be count
Discrete Data Examples:
number of houses, cars
VARIABLES
Type of Variable
Continuous Discrete
DEFINITIONS EXPLANATIONS
Variable A variable is a characteristic under study that assumes different values for
different elements. In contrast to a variable, the value of a constant is fixed.
Quantitative A variable that can be measured numerically is called a quantitative
variable.
Discrete A variable whose values are countable is called a discrete variable. In other
words, a discrete variable can assume only certain values with no
intermediate values.
Continuous A variable that can assume any numerical value over a certain interval or
intervals is called a continuous variable.
Qualitative or A variable that cannot assume a numerical value but can be classified into
categorical two or more nonnumeric categories is called a qualitative or categorical
variable. The data collected on such a variable are called qualitative data
EXAMPLE 2:
Indicate which of the following variables are quantitative and which are qualitative. Hence,
classify the quantitative variables as discrete or continuous.
a) Number of typographical errors in newspapers
b) Monthly TV cable bills
c) Spring break locations favored by college students
d) Number of cars owned by families
e) Lottery revenues of states
LEVEL OF MEASUREMENT
DEFINITIONS EXPLANATIONS
The nominal level of measurement classifies data into mutually exclusive
(nonoverlapping) categories in which no order or ranking can be imposed on
Nominal the data.
Example:
- gender, religion, political party, marital status.
The ordinal level of measurement classifies data into categories that can be
ranked; however, precise differences between the ranks do not exist.
Example:
Ordinal - from student evaluations, guest speakers might be ranked as superior,
average, or poor.
- Floats in a homecoming parade might be ranked as first place, second
place, etc.
EXAMPLE 3:
SAMPLING EXPLANATIONS
METHOD
Random Sampling Subject are selected by random numbers.
Systematic Subject are selected by using every kth number after the first subject is
Sampling randomly selected from 1 through k.
Stratified Sampling Subject are selected by dividing up the population into groups (strata),
and subjects are randomly selected within groups.
Cluster Sampling Subject are selected by using an intact group that is representative of
the population.
DATA COLLECTION
• The next step after the sample is identified and selected by using the appropriate sampling
technique is to determine the best way to reach the respondents in order to obtain the
required data.
• There are several methods of collecting data and each has its own advantages and
disadvantages.
• A researcher must choose the methods that provide the most information at minimum cost.
• The common methods of data collection are as follows:
a) Face-to-face interview (personal interview)
b) Telephone interview
c) Direct questionnaire (questionnaires are distributed and collected personally)
d) Mail or postal questionnaire (questionnaires are sent and received back through the
post)
e) Direct observation (respondents are observed and data recorded)
f) Other methods (e-mail, video recording)
EXAMPLE 4
QUALITATIVE DATA
Example:
Twenty-five army inductees were given a blood test to determine their blood type. The data set
is
A B B AB O
O O B AB B
B B O A O
A O O O AB
AB A O B A
Data collected by forming categories of values and indicating the number of data fall into
each category.
2) PIE CHART
Consist of a circle that divided into sectors to show the number of objects or percentage in
each group or category. The angle in the sector is proportional to the number or percentage of
elements in the category.
20
i. Number of type A blood is 25 5 people
100
i. 36% have most people have type O blood than
any other type
3) BAR CHART
A bar chart uses the length of vertical columns or horizontal bars to represent quantities
or percentages.
Type of Blood
10 9
8 7
frequency
6 5
4
4
2
0
A B O AB
Type of blood
EXAMPLE 5
N N T T T I R R I T
I N R R I N N I T N
I R T T T T N R R I
R R I N T R T I I T
T I N T T I R N R T
The pie chart shows the population of an area. If the number of employees is 1500, find the
number of
a) Widowed
b) Single
EXAMPLE 7
The graphs show that first-year college students spend the most on electronic equipment.
The following data are the scores in Mathematics Account test for 29 student of Class 2A.
50 50 50 50 53 53 53 54 61 62
64 64 64 68 68 70 79 79 79 79
79 79 80 80 83 83 83 95 95
This plot separates data entries into leading digits and trailing digits. The guidelines for
constructing stem-and-leaf plots are as follows.
i. Split each score or value into two sets of digits. The first (or leading) set of digits
is the stem, and the second (or trailing) set of digits is the leaf.
ii. List all the possible stem digits from the lowest to the highest.
iii. For each score in the mass of data, write down the leaf numbers on the line
labelled by the appropriate stem number.
i. mode = 79
ii. median = 68
iii. min = 50
iv. maximum = 95
A frequency table summarizes the data collected by forming intervals of values and
indicating the number of data that falls into each interval.
This frequency table with class intervals is known as the frequency distribution of
grouped data.
The grouping of data is often desirable because it reduces the complexity of the
data and helps to smoothen out irregularities in the distribution.
There are several guidelines that can be followed in constructing a grouped
frequency distribution.
i. Firstly, the class interval should be mutually exclusive. This means that the
class intervals should not overlap and must be clearly defined.
ii. Secondly, it is a good practice to ensure that class intervals are of equal
width except for open-ended classes. If there are no observations in a
particular interval, it should still be included to avoid a misleading
impression of the data.
iii. Thirdly, there should neither be too few classes nor too many classes. The
rule of thumb is, the number of classes should not be less than 5 and should
not be more than 15.
Frequency
70-79 74.5 7
6
80-89 84.5 5
4
90-99 94.5 2 2
100-109 104.5 0 0
49 59 69 79 89 99 109
score
Frequency polygon
A frequency polygon is obtained by connecting the midpoint (or class mark) of each class at the
top of the bar in the histogram.
Frequency Polygon
9
8
7
6
5
4
3
2
1
0
40-49 50-59 60-69 70-79 80-89 90-99 100-109
4) Ogive
i. “more than” cumulative frequency curve, where the cumulative frequency is the sum of
the frequencies for classes above that class.
ii. “less than” cumulative frequency curve, where the cumulative frequency is the sum of
the frequencies for classes below that class.
Ogive more than 100.00%
Ogive less than
93.10%
100% 100% 100.00%
75.86%
Cummulative Frequency
80% 80%
51.72% 72.41%
Frequency
60% 60%
27.59% 48.28%
40% 40%
20% 0.00% 20% 24.14%
0% 6.90%
0% 0.00%
49 59 69 79 89 99 59 69 79 89 99 109
Upper limits Lower limit
EXAMPLE 8
A listing of calories per 1 ounce of selected salad dressings (not fat-free) is given below.
Construct a stem and leaf plot for the data.
100 130 130 130 110 110 120 130 140 100
140 170 160 130 160 120 150 100 145 145
145 115 120 100 120 160 140 120 180 100
160 120 140 150 190 150 180 160
Stem Leaf
EXAMPLE 9
Shown is an ogive depicting the cumulative frequency of the average mathematics SAT scores
by state.
Type Formula
x1 x2 x3 ... xn
x
Mean ( X ) n
n
x i
i 1
n
Mode The mode of a set of data is the value that occurs most
The median of a data set is the middle value when the original
data values are arranged in descending or ascending
numerical order.
Median/Quartile/Percentile
Interquartile Range,
IQR Q3 Q1
Q1 P25 X 1
n
4
Position of the mean, median, and mode on the histogram or frequency curve can be
determine the general shape of the data distributions
positively skewed
negatively skewed
Symmetrical
n 2 1 n 2 n 2 1 n 2
xi xi xi xi
i1 n i1 i1 n i1
s
2
s
n 1 n 1
COEFFICIENT OF VARIATION
The coefficient of variation is the standard deviation divided by the mean of the same
data set, and expressed as a percentage.
Formula:
standard deviation
Coefficient of Variation 100%
mean
A larger coefficient of variation means that the data is more dispersed and less
consistent.
Boxplot
Graphs
Measure of
Mean Median Mode Mean Median Mode Mode Median Mean
Location
Box-Plot
Q2 Q1 Q3 Q2 Q2 Q1 Q3 Q2 Q3 Q2 Q2 Q1
Central
Median Mean Median
Tendency
EXAMPLE 11
The stem and leaf diagram shows the number of flies caught in an insect trap for 27 days.
Stem Leaf
0 1 1 2
1 2 3 5 5 6
2 2 2 3 5 8 8
3 4 4 4 4 5 7 7 9
4 2 6 7 7 8
Key: 1 2 means 12
(a) Find
(b) Illustrate the above data by constructing a box and whisker plot. Hence, describe the
skewness of the distribution.
EXAMPLE 12
The table shows the distribution of grades of students for a certain subject in an examination.
Grade 1 2 3 4 5 6 7 8 9
Number of Students 7 13 9 7 7 2 1 1 1
(a) Find
(b) Construct the box and whisker plot. Hence, state the shape of distribution.
EXAMPLE 13
The following is the systolic blood pressure, in mm Hg, of 10 patients in a hospital.
146 135 151 155 158 146 149 124 162 173
(a) Find the mean and mode. Describe the shape of the distribution.
(b) Find the standard deviation of the systolic blood pressure of the 10 patients. Hence, find the
Pearson’s coefficient of skewness. Comment on the distribution.
(c) Find the number of patients whose systolic blood pressures exceed one standard deviation
above or below the mean.
EXAMPLE 14