Math 01 Module DATA MANAGEMENT Enhanced
Math 01 Module DATA MANAGEMENT Enhanced
Module 3
Mathematics as a Tool: Data Management
Contents
Department of Mathematics
College of Arts and Sciences
Mariano Marcos State University
2020
Introduction
“It is easy to lie with statistics. It is hard to tell the truth without statistics.”
Andrejs Dunkel
BASIC TERMS
Some of the basic terminologies and notations involved in statistics are the
following:
a. Population - a collection or set of things or objects under consideration
b. Sample - a subset or representative group of the population
c. Data - refers to the information gathered in a research
Statistical data are classified according to their sources, namely: primary data
or secondary data.
Primary data – information gathered from respondents by the researcher
himself.
Secondary data – information obtained from published materials or data
gathered by other individuals or agencies. These are the data which are
transcribed from original sources.
d. Array – listing of observations which are arranged in an increasing or
decreasing magnitude
e. Parameter - a value which is computed from a population
f. Statistic – a value which is computed from a sample
g. Variable – a characteristic of interest that has been observed or measured on
every member of the population or sample.
A variable may be quantitative or qualitative where quantitative variable is
further classified as discrete or continuous.
i. Quantitative/Numerical variable – describes the amount or number of
an element of a sample or population
Discrete – takes on a countable amount (it is usually expressed as
whole number)
Example: number of books owned by a student
Continuous – measured in a continuous scale (it takes any value
within a range or interval)
Example: height of the students (in feet)
ii. Qualitative/Categorical variable – describes the quality, category, or
character of an element of a population or sample
Examples:
gender (male or female)
hair color (black, brown, blonde)
level of satisfaction of a student on his grade (highly satisfied,
satisfied, not satisfied)
Levels of Measurement
A more detailed distinction, termed as the levels of measurement, is used by
some researchers in examining the information that is collected. It is classified as
follows:
1. Nominal Measurement - numbers or symbols are used to code or classify
each element in the population. Note that the assigned numbers have no
numerical meaning.
Examples: gender, educational background, employment status
4. Ratio Measurement – A variable measured at this level not only includes the
concepts of order and interval, but also includes the idea of ’nothingness’, or
absolute zero.
Example: Measurement of height, weight, ages
Remark: The scale of measurement depends mainly on the method of
measurements and not on the property being measured.
One way of summarizing the data is to figure out the data set by using the
descriptive measures. Among the most commonly used descriptive measures
which are important are the measures of central tendency and measures of
The three measures of central tendency are the mean, median and mode
Definition 4: The mean also known as the arithmetic average is the sum of all the
observed values divided by the number of observations in the data set. It can be
n
Example 1: The scores of five students who are selected randomly in a class of
Math 01 are as follows: 44, 37, 41, 35 and 32. Find their average score.
Solution:
Applying the mean of ungrouped data gives
The means of subgroups can be combined to come up with the group mean
known as weighted mean. This can be calculated using the formula
n
∑ f i Xi
x́= i=1n
∑ fi
i=1
where
x i is the i
th
observation
fi is the frequency or weight for each observations
n is the total of the frequencies
Example 2: If the final examination of a class in statistics is given the weight 2, the
average quizzes the weight 3, and a project report the weight 1, what would be the
mean grade of a student who got the grades 90, 85 and 87, respectively.
Solution:
2 ( 90 )+ 3 ( 85 ) +1(87) 522
x́= = =87
2+3+1 6
The mean grade of the student is 87.
Remarks:
1. The mean may not be an actual observation in the data set.
2. The mean reflects the magnitude of every observation since every observation
contributes to the value of the mean.
3. The mean is not a good measure of central tendency if there is an extreme value
or observation since it is easily affected by extreme values. The best measure of
center for this case is the median.
The median of the data set consisting of an odd – numbered observations is the
Definition 5: The median is a single value which divides an array of observations into
two equal parts such that 50% of the observations falls above it and the remaining
50% falls below it. It may be written symbolically by ~
x read as “x - tilde”.
~
x=x n+1
middlemost value in the list. That is, where n is the number of observations.
2
If n is even, the median is the average of the two middlemost values. It can be
m +m
computed as x= 1 2
~ where m1 ¿ m2 are the two middlemost values. Take note
2
that the observations are first arranged in an array form (from lowest to highest) before
getting the median value.
Example 1: The number of books owned by the eleven children are as follows: 5, 2, 4, 6,
5, 10, 7, 6, 9, 8, 6. What is the median?
Solution:
Arrange the data in an array form: 2, 4, 5, 5, 6, 6, 6, 7, 8, 9, 10. Since the list
contains 11 numbers then the median is the middlemost value (6 th number) which is 6.
Example 2: Compute the median of the data set: 2.5, 4.0, 5.8, 3.5, 2.5, 8.2, 7.1, 3.7
Solution:
Forming an array, we have 2.5, 2.5, 3.5, 3.7, 4.0, 5.8, 7.1, 8.2. There are
X + X5
n=8 values, hence, the median is calculated as x= 4
~ =¿
2
3.7 +4.0
=3.85 .
2
Remarks:
1. The median value may not be an actual observation in the data set.
2. The median is a positional value, hence, it is not affected by the presence of
extreme observations.
3. When the data is qualitative, median is not a possible measure so described
the center by determining the mode.
Definition 6: The mode is an observation that occurs most frequently in the given
data set.
mode in set C are 25, 37 and 45 since these numbers have the highest frequency. Each
element in set D has the same number of occurrences, thus, the data set has no mode.
The distribution of data may be classified as unimodal, bimodal, trimodal or
multimodal distribution depending upon the number of modal values in the given data
set. In the above example, set A is unimodal, set B is bimodal and set C is trimodal.
Example 2: What is the modal color of the shirt worn by the students if the data gathered
were as follows: white, gray, gray, black, white, red, red, gray, black, white, white, red,
gray, red, gray, black, red, red, gray, gray, black?
Solution:
Since gray has the highest frequency, it follows that the modal color of the shirt
worn by the students is gray.
Remarks:
1. The mode can be used for both quantitative and qualitative data.
2. It is very much affected by the method of grouping.
3. It is determined by the frequency and not by the values of the observations.
DO THESE!
1. Company ABC is awarding the top ten most outstanding workers in their
company every year. The ages of the top ten awardees for the year 2018 are 47,
53, 36, 60, 30, 28, 42, 43, 38 and 52. Determine the mean, median and mode of
the ages.
2. The mean weight of 50 Balikbayan boxes is 135 kgs. What is the approximate
total weight of all the boxes?
3. The average height of the four basketball players is 74 inches. If the height of the
three players are 69 inches, 72 inches and 78 inches, what is the height of the
fourth player?
4. What is the median of the distribution given by 23, 17, 12, 8, 14, 25, 19, 22, 18?
If the maximum value is replaced by 40, what effect will this have on the median?
How about if the minimum is replaced by 0?
5. The final grades of a student in six subjects he enrolled last semester are shown
below.
Subject Number of Units Final Grade
Calculus 1 5 2.25
English 3 3 2.0
Psychology 1 3 1.5
Finance 2 3 2.0
Accounting 3 6 2.25
Humanities 3 1.75
Determine her average grade. If the subjects were of equal number of units, what
would be her average?
MEASURE OF DISPERSION
In some cases, describing the data using the measures of central tendency
alone is not enough to provide a sufficient information concerning a population or
sample. It should be supplemented by an analysis on how the individual elements of
the population/sample tends to cluster around the central tendency. Thus, an
analysis on the variability of the observations may be applied.
The most commonly used measures of dispersion are the range, variance,
and standard deviation. The simplest measure and easiest to compute but a rough
estimate for the measure of dispersion is the range.
Example 1. Compare the performances of the three students based on their ratings
Definition 8: The range, R, is the difference between the highest value (H) and
lowest value (L) in the data set. That is, R = H – L.
Example 2. The average daily allowances (in pesos) of 12 college students studying
at University Y are 112, 127, 118, 147.5, 165.5, 99.75, 150, 145, 145, 102, 136.25
and 113. Find the range.
Solution:
Given: H ¿ 165.5 and L ¿ 99.75 then range, R ¿ 165.5−99.75=65.25 .
The range of the daily allowances of 12 college students is 65.25 pesos.
Remarks:
1. The larger the value of the range, the more dispersed the observations are.
2. The range considers only the extreme values or observations in the data set.
Definition 9: The standard deviation is the positive square root of the variance.
The variance is the average of the squared deviations of every observation from
the mean.
The standard deviation and variance can be obtained from a population and a
sample but most its applications utilizes the sample rather than the population due to
the complete enumeration of the latter. The unit of the variance is squared unit while
that of the standard deviation is the same as the unit of the data set. The following
symbols are used to designate these measures to a population and sample.
Population Sample
Standard deviation σ s
2 2
Variance σ s
10 12 14 15 17 18 18 24
Find the variance and the standard deviation of the sample.
Solution:
xi (x− x́) ( x− x́)2
10 -6 36
12 -4 16
2 ∑ (x − x́)2 130
14 -2 4 s= = =18.57
n−1 7
15 -1 1
17 1 1
18 2 4 130
18 2 4
s= √ s =
2
√ 7
=4.31
24 8 __64__
2
( x− x́) =¿
Total 128 130
∑¿
n=8 x́=16
DO THESE!
Answer the following. Show a complete and neat solution for each problem.
1
the strengths of 3 brands of −¿ inch rope. The results of the tests are
8
shown in the following table. According to the same test results, which
1
company produces −¿ inch rope for which the breaking point has the
8
smallest standard deviation?
Company 1
breaking point of −¿ inch rope in
8
pounds
Trustworthy 122, 141, 151, 114, 108, 149, 125
Brand X 128, 127, 148, 164, 97, 109, 137
NeverSnap 112, 121, 138, 131, 134, 139, 135
4. Ten used trail bikes are randomly selected from a bike shop, and the
odometer reading of each is recorded as follows.
1,902, 103, 653, 1,901, 788, 361, 216, 363, 223, 656
Solve for the standard deviation and interpret.
position.
This measure divides the data set into subgroups such that a specific portion
of the data set belongs to the lower bracket and the remaining on the higher bracket.
Percentiles, deciles, and quartiles are among the most commonly used measures of
relative position.
of the total observations are found below P2 and 98% are above it, and so on.
located on the i
th
place can be computed as Pi= ¿ .
100
and 9th decile ( D9 ). The lowest decile D 1 corresponds to a value in the set
wherein 10% of the whole observations are located below D 1 , the second decile
D2 corresponds to a value in which 20% of the entire observations are lower than
D2 , ⋯ , and so on up to the last decile D 9 which has a value positioned at
the top such that 90% of all the observations are located below the value
corresponding to D9 .
Remarks:
1. The quartile and decile can be determined by solving its equivalent percentile.
a. Q1=P25 ; Q2 =P 50 ; Q3=75 .
b. D1=P10 ; D2 =P20 ; D3=P20 ; D3=P20 ; ⋯; D 9=P 90 .
2. Given a data set, then Median ¿ P50=Q2=D5 .
Example 1: Joy was told that relative to the other scores on a long exam in Statistics,
th
her score was the 95 percentile. This means that at least 95% of those who took the
test had scores less than or equal to Joy’s score, while at least 5% had a score higher
than Joy’s.
Example 2: Given the following data set: 25, 5, 6, 12, 8, 16, 17, 22, 20, 9. Compute
for
a) 20th percentile c) first quartile e) 3rd decile
b) 56th percentile d) 2nd quartile f) seventh decile
Solutions:
Arrange the scores in an increasing manner.
5, 6, 8, 9, 12, 16, 17, 20, 22, 25
a) 20th percentile
n=10,i=20
10(20) 200
P20= = =2 (location of 20th percentile)
100 100
This means that the 20 th percentile is the second score from the lowest.
So, P20=6 .
b) 56th percentile
10(56) 560
P56= = =5.6 ≈ 6
100 100
When the result is not exact round it to the nearest whole number. The
56th percentile is approximately described by the 6 th value in the data set.
Thus, P56=16 .
Note: Interpolation may be applied to find for an exact value
corresponding to the 56th percentile. P56=5.6 means that the 56th
percentile is between the 5 th and 6th value. To interpolate, multiply the
difference of the 5th and 6th values by the decimal part then add the result to
the 5th value. That is, ( 16−12 ) × 0.6=2.4 . So, P56=12+2.4=14.4 which
is the exact value.
c) first quartile,
(10)(25) 10
P25= = =2.5
100 4
P25 is located halfway between the 2 nd and 3rd value in the list. So,
P25=7 . Since Q1=P25 , therefore Q1=7 .
d) 2nd quartile
Note that Q2 has the same value as the median. Solving for the
12+16
median gives Md= =14 . So, Q2=14 .
2
e) 3rd decile
10 (30)
P30= =3 (3rd value from the lowest)
100
Therefore, D3=8 .
f) seventh decile
10 (70)
P70= =7 ( 7th number in the list)
10
The seventh decile is 17.
covering Q1 , median, and Q3 then draw a line segment across the box
passing through the median.
4. Connect the box to the extreme values by a line segment (known as whisker).
Example: Draw a box-and-whisker plot for the given data set: 23, 15, 5, 6, 12, 8, 16,
17, 22, 20, 9, 10.
Solution:
Arrange the values in an increasing pattern.
5, 6, 8, 9, 10,12, 15, 16, 17, 20, 22, 23
Identify the lowest and highest values and compute for Q1 , median ,
and Q3 .
Lowest value is 5 and highest value is 23
12 ( 25 )
Q1=P25= =3→ Q1=8
100
x 6 + x 7 12+15
Median = = =13.5
2 2
12 ( 75 )
Q3=P75= =9 →Q3=17
100
Follow steps 3 and 4 to illustrate the figure.
Stem-and-leaf display
An informative arrangement of data where actual values of the observations
are displayed can be visualized through the use of the stem-and-leaf display.
Definition 13. A stem - and- leaf display is an organized diagram showing the
relative position of every element in the data set such that the leading digit(s)
become the stem and the trailing digit(s) becomes the leaf.
Example. The table lists the number of words used by 30 students in their reflection.
63 100 20 89 80 75 56 58 63 83
57 49 50 37 33 24 27 15 29 32
49 61 73 99 84 43 55 57 58 77
Mathematics in the Modern World
17
Mathematics in Our World | Mathematics as a Tool: Data Management
NORMAL DISTRIBUTION
When most of the observations are near the “center” and the distribution of
data is nearly similar on both sides then the distribution is said to follow a normal
distribution. This distribution is one of the most commonly used distribution in the
field of Statistics which has various applications.
x−μ
deviation equal to one by using the formula z= .
δ
Applications:
Example 5. During 1 week, an overnight delivery company found that the weights of
its parcels were normally distributed, with a mean of 24oz and a standard deviation
of 6 oz.
a. What percent of the parcels weighed less than 42 oz?
b. What percent of the parcels weighed between 12 oz and 30 oz?
Solution:
Given: μ=24 , σ=6
x−μ 42−24
a. z= = =3
δ 6
P ( X <42 ) =P ( Z <3 )=0.5987 or 59.87 %
This indicates that 59.87 % of the parcels weighed more than 42 oz.
Steps:
i. Go to Formulas then click Insert function. Then click OK.
ii. Click NORM.DIST. In the dialog box, input X value under X, the average
under MEAN, standard deviation under Standard dev and TRUE under
cumulative.
x−μ 5000−5000
a. z= = =0
δ 1000
The probability that an employee selected will have salary of less than
Php 5000 is 0.5 or 50 % .
x −μ 5750−5000 x −μ 6500−5000
b. z 1= 1 = =0.75 and z 2= 2 = =1.5
δ 1000 δ 1000
The chance that an employee selected will have a salary of more than
Php6,600 is 5.48%.
DO THESE!
Show a complete solution for each problem.
1. Given a normal distribution with µ = 50 and δ = 10, find the probability that X
assumes a value between 45 and 62.
2. Given a normal distribution with µ = 300 and δ = 50, find the probability that X
assumes a value greater than 362.
3. In the qualifying examination for the admittance to college, the mean score was
65 and the standard deviation was 8. If 1,265 students took the qualifying exam,
how many of them scored between 60 and 75?
4. Records show that in a certain hospital the distribution of the “length of stay” of
its patients is normal with a mean of 10.5 days and a standard deviation of 2
days.
a. What percentage of the patients stayed 8 days?
b. What is the chance that a patient will stay in the hospital between 9 and 11
days?
5. An electrical firm manufactures light bulbs that have a length of life that is
normally distributed with mean equal to 800 hours and a standard deviation of 40
hours. Find the probability that a bulb burns between 778 and 834 hours.
If all points in the scatter diagram lie on a straight line, it is said to be a perfect
correlation. The degree or strength of relationship between two variables may be
computed using the correlation coefficient denoted by r. It is used primarily to
measure the degree of relationships between two variables that are linearly related.
Its value ranges from −1¿ 1 and it is computed using the formula
n ( Σ xy )−( Σx)(Σy)
r= 2 2
√ [ n ( Σ x ) −( Σ x ) ][ n ( Σ y ) −( Σy ) ]
2 2
Diagram showing the positive, negative and zero correlation (Aufmann, et.al).
Student A B C D E F G H I J
9 8 8 9 9 8 7 8 8 7
English grade
3 9 4 1 0 3 5 1 4 4
Mathematics 9 8 8 8 8 8 7 7 8 7
grade 1 6 0 8 9 7 8 5 5 7
Solution:
Let x represents the English grade and y be the Mathematics grade.
Studen 2 2
x y x y xy
t
A 93 91 8649 8281 8463
B 89 86 7921 7396 7654
C 84 80 7056 6400 6720
D 91 88 8281 7744 8008
E 90 89 8100 7921 8010
F 83 87 6889 7569 7221
G 75 78 5625 6084 5850
H 81 75 6561 5625 6075
I 84 85 7056 7225 7140
J 74 77 5476 5929 5698
2 2
n=10 ∑ x=844 ∑ y=836 ∑ x =71614 ∑ y =70174 ∑ xy=70,839
Using EXCEL in computing the correlation coefficient, simply follow these steps:
1. Cick More Functions under Autosum, click CORREL then OK.
2. Click Array 1 then highlight all cell entries in X column.
3. Click Array 2 then highlight all cell entries in Y column.
This outcome will appear in the computer display.
Example 2. Find the equation of the regression line in example 1. Predict the grade
of the student in English if his Mathematics grade is 93.
Solution: From example 1,
n=10,∑ x=844, ∑ y=836 , ∑ x 2=71614 and ∑ xy=70,839
The slope of the line is
10 ( 70, 839 )−(844)(836) 2806
b= = =0.74 .
10 ( 71,614 ) −(844 )2 3804
844 836
Note that x́= =84.4 and ý= =83.6 . So, the y-intercept of the line is
10 10
a= ý−b x́=83.6−0.74 ( 84.4 ) =21.144 .
Using EXCEL application to process the data, simply modify the function used in
example 1. Instead of choosing CORREL, change it to the needed function like
SLOPE or INTERCEPT.
DO THESE!
1. The grades of a class of 9 students on a midterm report (x) and on the final
examination (y) are as follows:
x 77 50 71 72 81 94 99 67
96
y 82 66 78 34 47 85 99 68
99
a. Find the equation of the regression line.
b. Estimate the final examination grade of a student who received a grade of
85 on the midterm report.
cost pesos
(per Php1000) (per Php1000)
40 385
20 400
25 570
20 495
45 440
50 490
40 385
20 537
15 395
40 610
25 285
50 600
References:
Aufmann, Richard N., Joanne S. Lockwood, Richard D. Nation, and Daniel K. Clegg.
2013. Mathematical Excursions. 3rd ed., Brooks/Cole Cengage Learning,USA.
Mathematics, A Practical Odyssey by Johnson & Mowry
Math in Our World by Sobecki, et. al.