0% found this document useful (0 votes)
156 views

Math 01 Module DATA MANAGEMENT Enhanced

This document discusses data management and statistics as important tools. It defines statistics as dealing with collecting, organizing, analyzing, and interpreting data to provide meaningful information. Descriptive statistics summarizes and presents data, while inferential statistics allows conclusions about populations based on samples. Measures of central tendency like the mean, median, and mode are introduced to summarize data sets. The mean is defined as the sum of all values divided by the number of observations. An example calculates the average score of 5 students.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
156 views

Math 01 Module DATA MANAGEMENT Enhanced

This document discusses data management and statistics as important tools. It defines statistics as dealing with collecting, organizing, analyzing, and interpreting data to provide meaningful information. Descriptive statistics summarizes and presents data, while inferential statistics allows conclusions about populations based on samples. Measures of central tendency like the mean, median, and mode are introduced to summarize data sets. The mean is defined as the sum of all values divided by the number of observations. An example calculates the average score of 5 students.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Mathematics in Our World | Mathematics as a Tool: Data Management

Module 3
Mathematics as a Tool: Data Management

Contents

A. Basic Concepts in Statistics


B. Measures of Central Tendency
C. Measures of Dispersion
D. Measures of Relative Position
E. Normal Distribution
F. Linear Regression and Correlation

Department of Mathematics
College of Arts and Sciences
Mariano Marcos State University

2020

Mathematics in the Modern World


1
Mathematics in Our World | Mathematics as a Tool: Data Management

Introduction

“It is easy to lie with statistics. It is hard to tell the truth without statistics.”
Andrejs Dunkel

Data management is a process by which information is acquired and


processed to ensure the accessibility and reliability of the data for its users. One of
the most important tool in processing and managing such information is statistics.
Statistics is utilized in most areas of human endeavor. It is usually used in education,
research, business, agriculture, and other fields and even in everyday life activities.

Definition 1: Statistics is a science which deals with the collection, organization,


presentation, analysis, and interpretation of data so as to give a more meaningful
information.

Data or the pieces of information may be collected by conducting a survey,


interview, observation, and experiment. The data gathered can be properly
organized and presented graphically by a line graph, bar graph or pictograph or with
the aid of a statistical table known as frequency distribution table (FDT). A concise
and meaningful conclusion is obtained from the analysis and interpretation of data.
Relevant information can be deduced from the analysis of numerical descriptions
and predictions may be made based on a small group to project the whole
population. The work of statistics offers a wide area of concern. Thus, statistics is
subdivided into two branches, namely: descriptive statistics and inferential statistics.

Definition 2: Descriptive statistics refers to the collection, organization,


summary, and presentation of data while inferential statistics deals with the
interpretation and analysis of data where conclusion is drawn based from the
subset of the population.

In descriptive statistics, a set of data is simply described without drawing any


inferences or implications. The data is merely summarized and discussed in a clear,
concise and informative manner. In inferential statistics, information or inferences
concerning a large group known as population is provided based on the study of a
representative group or selected members in the population which are identified as
sample. Calculating the average rating of a class of 40 students in Math 01
illustrates the descriptive statistics while determining the performance of the same

Mathematics in the Modern World


2
Mathematics in Our World | Mathematics as a Tool: Data Management

class based on the performance of 10 randomly selected members in the class


exhibits inferential statistics.

BASIC TERMS
Some of the basic terminologies and notations involved in statistics are the
following:
a. Population - a collection or set of things or objects under consideration
b. Sample - a subset or representative group of the population
c. Data - refers to the information gathered in a research
Statistical data are classified according to their sources, namely: primary data
or secondary data.
 Primary data – information gathered from respondents by the researcher
himself.
 Secondary data – information obtained from published materials or data
gathered by other individuals or agencies. These are the data which are
transcribed from original sources.
d. Array – listing of observations which are arranged in an increasing or
decreasing magnitude
e. Parameter - a value which is computed from a population
f. Statistic – a value which is computed from a sample
g. Variable – a characteristic of interest that has been observed or measured on
every member of the population or sample.
A variable may be quantitative or qualitative where quantitative variable is
further classified as discrete or continuous.
i. Quantitative/Numerical variable – describes the amount or number of
an element of a sample or population
 Discrete – takes on a countable amount (it is usually expressed as
whole number)
Example: number of books owned by a student
 Continuous – measured in a continuous scale (it takes any value
within a range or interval)
Example: height of the students (in feet)
ii. Qualitative/Categorical variable – describes the quality, category, or
character of an element of a population or sample

Mathematics in the Modern World


3
Mathematics in Our World | Mathematics as a Tool: Data Management

Examples:
gender (male or female)
hair color (black, brown, blonde)
level of satisfaction of a student on his grade (highly satisfied,
satisfied, not satisfied)

Levels of Measurement
A more detailed distinction, termed as the levels of measurement, is used by
some researchers in examining the information that is collected. It is classified as
follows:
1. Nominal Measurement - numbers or symbols are used to code or classify
each element in the population. Note that the assigned numbers have no
numerical meaning.
Examples: gender, educational background, employment status

2. Ordinal Measurement– uses numerical category that expresses the


meaningful order. There is no indication of distance between positions. The
numbers become meaningful because they reveal whether one class or
category is more or less than the other. Categories are ranked according to
the order of their value on the property like first, second, third; oldest, next
oldest, youngest.
Example: rank in beauty contest

3. Interval Measurement– has equal intervals. There is significance to the


distance between any two values. It tells us that one unit differs by a certain
amount of the property from another unit. It has no absolute zero.
Example: Aptitude test, temperature

4. Ratio Measurement – A variable measured at this level not only includes the
concepts of order and interval, but also includes the idea of ’nothingness’, or
absolute zero.
Example: Measurement of height, weight, ages
Remark: The scale of measurement depends mainly on the method of
measurements and not on the property being measured.

Mathematics in the Modern World


4
Mathematics in Our World | Mathematics as a Tool: Data Management

For instance, the weight of a pack of milk measured in kilograms has an


interval scale but if the boxes are labelled as one of small, medium or large, the
weight is measured in ordinal scale.

Measure of Central Tendency

One way of summarizing the data is to figure out the data set by using the
descriptive measures. Among the most commonly used descriptive measures
which are important are the measures of central tendency and measures of

Definition 3: A measure of central tendency (or central location) is a single


value that is used to identify the “center” of the data set or set of observations.
dispersion.

The three measures of central tendency are the mean, median and mode

Definition 4: The mean also known as the arithmetic average is the sum of all the
observed values divided by the number of observations in the data set. It can be
n

computed as μ = ∑ Xi where xi is the i


th
observation and n is the
i=1
n
number of observations in the data set.
where the mean is the most familiar measure of the “center”.

The mean of the population is symbolized by the lowercase letter “mu” in


Greek alphabet, μ , while the mean of the sample is represented by x́ (x –
bar).

Example 1: The scores of five students who are selected randomly in a class of
Math 01 are as follows: 44, 37, 41, 35 and 32. Find their average score.
Solution:
Applying the mean of ungrouped data gives

44+ 37+41+35+32 189


x́= = =37.8 .
5 5
Hence, the average score of the five students is 37.8.

Mathematics in the Modern World


5
Mathematics in Our World | Mathematics as a Tool: Data Management

The means of subgroups can be combined to come up with the group mean
known as weighted mean. This can be calculated using the formula
n

∑ f i Xi
x́= i=1n
∑ fi
i=1

where
x i is the i
th
observation
fi is the frequency or weight for each observations
n is the total of the frequencies

Example 2: If the final examination of a class in statistics is given the weight 2, the
average quizzes the weight 3, and a project report the weight 1, what would be the
mean grade of a student who got the grades 90, 85 and 87, respectively.
Solution:
2 ( 90 )+ 3 ( 85 ) +1(87) 522
x́= = =87
2+3+1 6
The mean grade of the student is 87.
Remarks:
1. The mean may not be an actual observation in the data set.
2. The mean reflects the magnitude of every observation since every observation
contributes to the value of the mean.
3. The mean is not a good measure of central tendency if there is an extreme value
or observation since it is easily affected by extreme values. The best measure of
center for this case is the median.
The median of the data set consisting of an odd – numbered observations is the

Definition 5: The median is a single value which divides an array of observations into
two equal parts such that 50% of the observations falls above it and the remaining
50% falls below it. It may be written symbolically by ~
x read as “x - tilde”.

~
x=x n+1
middlemost value in the list. That is, where n is the number of observations.
2

If n is even, the median is the average of the two middlemost values. It can be

m +m
computed as x= 1 2
~ where m1 ¿ m2 are the two middlemost values. Take note
2

Mathematics in the Modern World


6
Mathematics in Our World | Mathematics as a Tool: Data Management

that the observations are first arranged in an array form (from lowest to highest) before
getting the median value.

Example 1: The number of books owned by the eleven children are as follows: 5, 2, 4, 6,
5, 10, 7, 6, 9, 8, 6. What is the median?
Solution:
Arrange the data in an array form: 2, 4, 5, 5, 6, 6, 6, 7, 8, 9, 10. Since the list
contains 11 numbers then the median is the middlemost value (6 th number) which is 6.

Example 2: Compute the median of the data set: 2.5, 4.0, 5.8, 3.5, 2.5, 8.2, 7.1, 3.7
Solution:
Forming an array, we have 2.5, 2.5, 3.5, 3.7, 4.0, 5.8, 7.1, 8.2. There are

X + X5
n=8 values, hence, the median is calculated as x= 4
~ =¿
2

3.7 +4.0
=3.85 .
2
Remarks:
1. The median value may not be an actual observation in the data set.
2. The median is a positional value, hence, it is not affected by the presence of
extreme observations.
3. When the data is qualitative, median is not a possible measure so described
the center by determining the mode.

Definition 6: The mode is an observation that occurs most frequently in the given
data set.

Example 1: Find the mode in the following sets of scores.


a) set A: 36, 36, 12, 29, 35, 45. 50, 45, 45, 53
b) set B: 8, 7, 6, 5, 6, 9, 2, 3, 11, 11, 43, 10
c) set C: 39, 23, 25, 25, 63, 37, 45, 37, 48, 51, 28, 45, 50
d) set D: 2, 9, 8, 12, 5, 13, 6, 10
Solution:
The mode in set A is 45 because 45 occurs most frequently in the list. Both 6 and
11 have the most number in set B, therefore, set B has the mode equal to 6 and 11. The

Mathematics in the Modern World


7
Mathematics in Our World | Mathematics as a Tool: Data Management

mode in set C are 25, 37 and 45 since these numbers have the highest frequency. Each
element in set D has the same number of occurrences, thus, the data set has no mode.
The distribution of data may be classified as unimodal, bimodal, trimodal or
multimodal distribution depending upon the number of modal values in the given data
set. In the above example, set A is unimodal, set B is bimodal and set C is trimodal.

Example 2: What is the modal color of the shirt worn by the students if the data gathered
were as follows: white, gray, gray, black, white, red, red, gray, black, white, white, red,
gray, red, gray, black, red, red, gray, gray, black?
Solution:
Since gray has the highest frequency, it follows that the modal color of the shirt
worn by the students is gray.

Remarks:
1. The mode can be used for both quantitative and qualitative data.
2. It is very much affected by the method of grouping.
3. It is determined by the frequency and not by the values of the observations.

DO THESE!
1. Company ABC is awarding the top ten most outstanding workers in their
company every year. The ages of the top ten awardees for the year 2018 are 47,
53, 36, 60, 30, 28, 42, 43, 38 and 52. Determine the mean, median and mode of
the ages.
2. The mean weight of 50 Balikbayan boxes is 135 kgs. What is the approximate
total weight of all the boxes?
3. The average height of the four basketball players is 74 inches. If the height of the
three players are 69 inches, 72 inches and 78 inches, what is the height of the
fourth player?
4. What is the median of the distribution given by 23, 17, 12, 8, 14, 25, 19, 22, 18?
If the maximum value is replaced by 40, what effect will this have on the median?
How about if the minimum is replaced by 0?
5. The final grades of a student in six subjects he enrolled last semester are shown
below.
Subject Number of Units Final Grade

Mathematics in the Modern World


8
Mathematics in Our World | Mathematics as a Tool: Data Management

Calculus 1 5 2.25
English 3 3 2.0
Psychology 1 3 1.5
Finance 2 3 2.0
Accounting 3 6 2.25
Humanities 3 1.75

Determine her average grade. If the subjects were of equal number of units, what
would be her average?

MEASURE OF DISPERSION
In some cases, describing the data using the measures of central tendency
alone is not enough to provide a sufficient information concerning a population or
sample. It should be supplemented by an analysis on how the individual elements of
the population/sample tends to cluster around the central tendency. Thus, an
analysis on the variability of the observations may be applied.
The most commonly used measures of dispersion are the range, variance,

Definition 7: A measure of dispersion/measure of variation is a quantity that


measures the spread or variability of the values in a given set of data.

and standard deviation. The simplest measure and easiest to compute but a rough
estimate for the measure of dispersion is the range.

Example 1. Compare the performances of the three students based on their ratings

Definition 8: The range, R, is the difference between the highest value (H) and
lowest value (L) in the data set. That is, R = H – L.

(in percent) in the 5 long tests.


Solution:
Student A : 83, 80, 89, 78, 70 μ=80 R=19
Student B : 78, 79, 80, 81, 82 μ=80 R=4
Student C : 80, 80, 80, 80, 80 μ=80 R=0

Mathematics in the Modern World


9
Mathematics in Our World | Mathematics as a Tool: Data Management

In terms of measure of central tendency, each student performs equally since


they have same average rating of 80%. However, looking at the variability of their
ratings, Student A has the highest range as compared to the other students. This
shows that scores of student A are more dispersed than the other. The rating of
Student A is fluctuating while that of Student B is uniformly distributed. On the other
hand, Student C has range equal to zero so his ratings are all concentrated at its
mean indicating that the distribution has no spread.

Example 2. The average daily allowances (in pesos) of 12 college students studying
at University Y are 112, 127, 118, 147.5, 165.5, 99.75, 150, 145, 145, 102, 136.25
and 113. Find the range.
Solution:
Given: H ¿ 165.5 and L ¿ 99.75 then range, R ¿ 165.5−99.75=65.25 .
The range of the daily allowances of 12 college students is 65.25 pesos.

Remarks:
1. The larger the value of the range, the more dispersed the observations are.
2. The range considers only the extreme values or observations in the data set.

A more reliable measure in describing the spread of a set of observations is


the standard deviation. Most researches uses this measure in the treatment of data.
The computation includes all the values in the data set.

Definition 9: The standard deviation is the positive square root of the variance.
The variance is the average of the squared deviations of every observation from
the mean.

The standard deviation and variance can be obtained from a population and a
sample but most its applications utilizes the sample rather than the population due to
the complete enumeration of the latter. The unit of the variance is squared unit while
that of the standard deviation is the same as the unit of the data set. The following
symbols are used to designate these measures to a population and sample.

Population Sample
Standard deviation σ s
2 2
Variance σ s

Mathematics in the Modern World


10
Mathematics in Our World | Mathematics as a Tool: Data Management

The variance and standard deviation of a population are calculated by using


the formulas below.
Variance and Standard deviation of Population: Consider x 1 , x 2 , x 3 , . . ., x N be

the N elements of a population. Then, the population variance is


∑ ( x i−μ )2
σ 2 = i=1
N
and the population standard deviation is σ =√ σ 2 .

Sample Variance: Let x 1 , x 2 , x 3 , . . ., x n be the random sample of n

observations. Then, the sample variance is


∑ ( x i−x́ )2 and the standard
s 2= i=1
n−1
deviation of the sample is s= √ s2 .
Example 1: The following are the scores of a student in all her long exams in
Calculus: 83, 80, 89, 78, and 70. Calculate the standard deviation.
Solution:
xi x i−μ
N
(x i−μ)2 ∑ ( x i−μ )2 194 38.8 (Variance)
83 3 9 σ 2 = i=1 = =¿
N 5
80 0 0
89 9 81 σ =6.23 (Standard deviation)
78 −¿ 2
4 The standard deviation of the population
is 6.23
70 −¿ 10
100
Total 400 ∑ ( xi −μ)2 =¿ 194
N=5 μ=80
The result indicates that on the average, the percentage scores of the student
tends to deviate from the mean by an amount of 6.23 units.

Example 2: The following data were obtained by sampling on a population.

Mathematics in the Modern World


11
Mathematics in Our World | Mathematics as a Tool: Data Management

10 12 14 15 17 18 18 24
Find the variance and the standard deviation of the sample.
Solution:
xi (x− x́) ( x− x́)2
10 -6 36
12 -4 16
2 ∑ (x − x́)2 130
14 -2 4 s= = =18.57
n−1 7
15 -1 1
17 1 1
18 2 4 130
18 2 4
s= √ s =
2
√ 7
=4.31

24 8 __64__
2
( x− x́) =¿
Total 128 130
∑¿
n=8 x́=16

The variance is 18.57 while the standard deviation is approximately 4.31.


What can you infer from this?
Remarks: A large amount of standard deviation indicates that, on the average, the
data values will be far from the mean while the standard deviation of smaller amount
shows that, on the average, the data values will be close to the mean.

DO THESE!
Answer the following. Show a complete and neat solution for each problem.

1. An interview was made to a class of 20 college students to determine the


number of books owned by the students. The data gathered are as follows:
4, 9, 0, 1, 3, 24, 12, 3, 30, 12, 7, 13, 18, 4, 5, 15, 8, 10, 10, and 12. Treating
the data as a population, calculate the standard deviation.
2. (Adopted from Mathematics A Practical Odyssey). To settle an argument over
who is a better bowler between Danny and George, the two agreed to bowl
six games and whoever has the highest “average” will be the best. Their
bowling scores are presented in the table below. Compute and compare their
averages. Who is the better bowler?

Mathematics in the Modern World


12
Mathematics in Our World | Mathematics as a Tool: Data Management

George 185 135 200 185 250 155


Danny 182 185 188 185 180 190

3. (Mathematical Excursions by Aufmann). A consumer testing agency has tested

1
the strengths of 3 brands of −¿ inch rope. The results of the tests are
8
shown in the following table. According to the same test results, which

1
company produces −¿ inch rope for which the breaking point has the
8
smallest standard deviation?
Company 1
breaking point of −¿ inch rope in
8
pounds
Trustworthy 122, 141, 151, 114, 108, 149, 125
Brand X 128, 127, 148, 164, 97, 109, 137
NeverSnap 112, 121, 138, 131, 134, 139, 135

4. Ten used trail bikes are randomly selected from a bike shop, and the
odometer reading of each is recorded as follows.
1,902, 103, 653, 1,901, 788, 361, 216, 363, 223, 656
Solve for the standard deviation and interpret.

Measures of Relative Position

A statistical tool which is significant in identifying the position of an


observation relative to the other elements in a given data set the measure of relative

Definition 10: A measure of relative position is a statistical measure that


provides the specific location of an observation relative to the other values when
the data are in ranked order.

position.

This measure divides the data set into subgroups such that a specific portion
of the data set belongs to the lower bracket and the remaining on the higher bracket.
Percentiles, deciles, and quartiles are among the most commonly used measures of
relative position.

Mathematics in the Modern World


13
Mathematics in Our World | Mathematics as a Tool: Data Management

In determining the desired measure, the data must first be arranged in an


increasing pattern. The entire set of observations in a percentile contains 99
partitions which are located at P1 , P2 , ⋯ , and P99 where 1% of the total
observations are lower than P1 and the remaining 99% are higher than P1 , 2%

of the total observations are found below P2 and 98% are above it, and so on.

Analogous to this, quartiles have the subdivisions described by Q1 (the first


quartile which has 25% of the observations falling below it and the remaining 75%
above it), Q2 (the second quartile which is equal to the median and has 50% of
the observations below it), and Q3 (the third quartile with 75% of the total
observations falls below it and the remaining 25% lies above it).

The portions of deciles are the 1 st decile ( D1 ), 2nd decile ( D 2 ), ⋯ ,

Definition 12: Formula for the Percentile


The percentile Pi of ungrouped data consisting of n observations

located on the i
th
place can be computed as Pi= ¿ .
100

and 9th decile ( D9 ). The lowest decile D 1 corresponds to a value in the set
wherein 10% of the whole observations are located below D 1 , the second decile
D2 corresponds to a value in which 20% of the entire observations are lower than
D2 , ⋯ , and so on up to the last decile D 9 which has a value positioned at
the top such that 90% of all the observations are located below the value
corresponding to D9 .

Remarks:
1. The quartile and decile can be determined by solving its equivalent percentile.
a. Q1=P25 ; Q2 =P 50 ; Q3=75 .
b. D1=P10 ; D2 =P20 ; D3=P20 ; D3=P20 ; ⋯; D 9=P 90 .
2. Given a data set, then Median ¿ P50=Q2=D5 .

Example 1: Joy was told that relative to the other scores on a long exam in Statistics,
th
her score was the 95 percentile. This means that at least 95% of those who took the

Mathematics in the Modern World


14
Mathematics in Our World | Mathematics as a Tool: Data Management

test had scores less than or equal to Joy’s score, while at least 5% had a score higher
than Joy’s.

Example 2: Given the following data set: 25, 5, 6, 12, 8, 16, 17, 22, 20, 9. Compute
for
a) 20th percentile c) first quartile e) 3rd decile
b) 56th percentile d) 2nd quartile f) seventh decile
Solutions:
Arrange the scores in an increasing manner.
5, 6, 8, 9, 12, 16, 17, 20, 22, 25
a) 20th percentile
n=10,i=20
10(20) 200
P20= = =2 (location of 20th percentile)
100 100
This means that the 20 th percentile is the second score from the lowest.
So, P20=6 .

b) 56th percentile
10(56) 560
P56= = =5.6 ≈ 6
100 100
When the result is not exact round it to the nearest whole number. The
56th percentile is approximately described by the 6 th value in the data set.
Thus, P56=16 .
Note: Interpolation may be applied to find for an exact value
corresponding to the 56th percentile. P56=5.6 means that the 56th
percentile is between the 5 th and 6th value. To interpolate, multiply the
difference of the 5th and 6th values by the decimal part then add the result to
the 5th value. That is, ( 16−12 ) × 0.6=2.4 . So, P56=12+2.4=14.4 which
is the exact value.
c) first quartile,
(10)(25) 10
P25= = =2.5
100 4
P25 is located halfway between the 2 nd and 3rd value in the list. So,
P25=7 . Since Q1=P25 , therefore Q1=7 .

Mathematics in the Modern World


15
Mathematics in Our World | Mathematics as a Tool: Data Management

d) 2nd quartile
Note that Q2 has the same value as the median. Solving for the

12+16
median gives Md= =14 . So, Q2=14 .
2
e) 3rd decile
10 (30)
P30= =3 (3rd value from the lowest)
100
Therefore, D3=8 .
f) seventh decile
10 (70)
P70= =7 ( 7th number in the list)
10
The seventh decile is 17.

Box - and - Whisker Plot

Definition 12: A diagram showing the representation of a 5-point summary of a


data set specified by the lowest and the highest values, the values corresponding
to Q1 and Q3 , and the median is called a box – and - whisker plot also
known as box plot.

The five important numbers are arranged increasingly in a horizontal or


vertical scale. Diagrammatically, we have

Diagram from Mathematical Excursions by Aufmann

Here is a summary in the construction of a box plot.


Steps in the Construction of Box – and – Whisker Plot
1. Arrange the values in an increasing pattern.
2. Compute for Q1 , median , and Q3 .
3. Locate the five numbers (lowest and the highest values, Q1 , median, and
Q3 ) in the number line and draw a rectangle (box) above the scales

Mathematics in the Modern World


16
Mathematics in Our World | Mathematics as a Tool: Data Management

covering Q1 , median, and Q3 then draw a line segment across the box
passing through the median.
4. Connect the box to the extreme values by a line segment (known as whisker).

Example: Draw a box-and-whisker plot for the given data set: 23, 15, 5, 6, 12, 8, 16,
17, 22, 20, 9, 10.
Solution:
 Arrange the values in an increasing pattern.
5, 6, 8, 9, 10,12, 15, 16, 17, 20, 22, 23
 Identify the lowest and highest values and compute for Q1 , median ,
and Q3 .
Lowest value is 5 and highest value is 23
12 ( 25 )
Q1=P25= =3→ Q1=8
100
x 6 + x 7 12+15
Median = = =13.5
2 2
12 ( 75 )
Q3=P75= =9 →Q3=17
100
Follow steps 3 and 4 to illustrate the figure.

Stem-and-leaf display
An informative arrangement of data where actual values of the observations
are displayed can be visualized through the use of the stem-and-leaf display.

Definition 13. A stem - and- leaf display is an organized diagram showing the
relative position of every element in the data set such that the leading digit(s)
become the stem and the trailing digit(s) becomes the leaf.

Example. The table lists the number of words used by 30 students in their reflection.
63 100 20 89 80 75 56 58 63 83
57 49 50 37 33 24 27 15 29 32
49 61 73 99 84 43 55 57 58 77
Mathematics in the Modern World
17
Mathematics in Our World | Mathematics as a Tool: Data Management

Draw a stem-and-leaf display of these data.


Solution:
Stem Leaf
1 5
2 0 4 7 9
3 2 3 7
4 3 9 9
5 0 5 6 7 7 8 8
6 1 3 3
7 3 5 7
8 0 3 4 9
9 9
10 0
DO THESE!
1. An interview was made to a class of 20 college students to determine the
number of books owned by the students. The data gathered are as follows:
4, 9, 0, 1, 3, 24, 12, 3, 30, 12, 7, 13, 18, 4, 5, 15, 8, 10, 10, and 12.
a. Solve for the following measures and interpret the result.
i. P45 ii. Q1 iii. D4
b. Construct a box-and-whiskers plot.
c. Create the stem-and-leaf display.
2. Consider the scores of the two bowlers in the previous exercise.
George 185 135 200 185 250 155
Danny 182 185 188 185 180 190
a. Compare their scores which corresponds to i) Q3 ii) D7
b. If the scores of Danny and George are combined to form a single
population, compute for i) P42 ii) P70 .

NORMAL DISTRIBUTION

When most of the observations are near the “center” and the distribution of
data is nearly similar on both sides then the distribution is said to follow a normal
distribution. This distribution is one of the most commonly used distribution in the
field of Statistics which has various applications.

Definition 14: A normal distribution, named as the Gaussian distribution, is a


continuous probability distribution which is drawn graphically by a smooth bell-
Mathematics in the Modern World
shaped curve called the normal curve having an area under it which is equal to 18
one.
Mathematics in Our World | Mathematics as a Tool: Data Management

Properties of a Normal Distribution


Any normal distribution has the following properties:
1. The total area under the normal curve is one.
2. The three measures of central tendency given by the mean, median and
mode are all equal.
3. It is symmetric with respect to the vertical line X =μ .
4. The curve is asymptotic with respect to the horizontal axis on both
directions.
The proportion of values in a given data set which is normally distributed is
based on the mean and the standard deviation of the data set. That is,
 about 68% of the observations fall within 1 standard deviation away from the
mean;
 about 95% of the observations fall within 2 standard deviations away from the
mean; and
 about 99.7% of the observations fall within 3 standard deviations away from
the mean.
The diagram shows the different percentages defined by the empirical rules
for normal distributions.

Diagram from Mathematical Excursion by Aufmann

Every distribution has a unique probability so areas based on a standard


normal distribution will be used.

Definition: A standard normal distribution is a distribution of a random variable


with mean zero and standard deviation equal to one. That is, Z N (0,1) .
Mathematics in the Modern World
19
Mathematics in Our World | Mathematics as a Tool: Data Management

A random variable X with mean μ and standard deviation δ can be


transformed into a standard normal variable Z with mean zero and standard

x−μ
deviation equal to one by using the formula z= .
δ

Rules in Finding the Areas Under the Normal Curve


Case 1. P( Z< z 1 )
When the area under the curve is located to the left of z1 , simply read its
value corresponding to the area in the table for the areas under the normal curve.
Example: 1. Find the area to the left of 1.76 .
2. Give the probability P ( Z <2.50 ) .
Solutions:
1. Using the z table, P ( Z < z 1 )=P ( Z <1.76 )=0.9608
By using EXCEL, follow the steps below:
Steps: i. Go to Formulas then click Insert function.
ii. Select a function by clicking NORM.S. DIST then press OK.
iii. In the dialog box, input z value under z and TRUE under
cumulative then click OK to get the result.

2. P ( Z <2.5 )=P ( Z< 2.5 )=0.9938 or 99.38 %


Solving for the probability using EXCEL gives

Mathematics in the Modern World


20
Mathematics in Our World | Mathematics as a Tool: Data Management

Case 2. P(Z> z 1 )=1−P( Z < z 1)


Example 3. Find P ( Z >2.50 ) .
Solution:
P ( Z >2.50 )=1−P ( Z <2.50 )
¿ 1−0.9938
¿ 0.0062 or 0.62 %

By computer application, the probability is

Case3. P ( z1 < Z< z2 ) =P ( Z < z 2 )−P( Z < z 1)


This is applied when the area is bounded between two values in an interval.
Example 4. What is the area bounded between Z =−1.22 and Z =2.03
Solution:
P (−1.5<Z <2.03 ) =P ( Z <2.03 )−P(Z ←1.22)
¿ 0.9788−0.1112
¿ 0.8676

Applying the computer application,

Mathematics in the Modern World


21
Mathematics in Our World | Mathematics as a Tool: Data Management

Applications:

Example 5. During 1 week, an overnight delivery company found that the weights of
its parcels were normally distributed, with a mean of 24oz and a standard deviation
of 6 oz.
a. What percent of the parcels weighed less than 42 oz?
b. What percent of the parcels weighed between 12 oz and 30 oz?
Solution:
Given: μ=24 , σ=6

x−μ 42−24
a. z= = =3
δ 6
P ( X <42 ) =P ( Z <3 )=0.5987 or 59.87 %
This indicates that 59.87 % of the parcels weighed more than 42 oz.

x 1−μ 12−24 x −μ 30−24


b. z 1= = =−2 and z 2= 2 = =1
δ 6 δ 6
P (12< X <30 )=P(−2<Z <1)
¿ P ( Z <1 )−P( Z ←2)
¿ 0.8413−0.0228
¿ 0.8185 or 81.85 %
Therefore, 81.85 % of the parcels weighed between 12 oz and 30 oz.

Solution using EXCEL:


a. What percent of the parcels weighed less than 42 oz?

Steps:
i. Go to Formulas then click Insert function. Then click OK.

Mathematics in the Modern World


22
Mathematics in Our World | Mathematics as a Tool: Data Management

ii. Click NORM.DIST. In the dialog box, input X value under X, the average
under MEAN, standard deviation under Standard dev and TRUE under
cumulative.

iii. Click OK to get the result.

Mathematics in the Modern World


23
Mathematics in Our World | Mathematics as a Tool: Data Management

b. What percent of the parcels weighed between 12 oz and 30 oz?


Find the probabilities of the two x values separately by following the steps in
(a) then subtract P( X <30) – P( X <12). The result is shown below.

Example 6. The salaries of employees of a certain company in Metro Manila have a


mean of Php5000 and a standard deviation of Php1000. What is the probability that
an employee selected will have a salary of
a. less than Php 5000?
b. between Php 5,750 and Php 6,500?
c. less than Php 6,600?
Solutions:
Given: X (5000,1000)
Mathematics in the Modern World
24
Mathematics in Our World | Mathematics as a Tool: Data Management

x−μ 5000−5000
a. z= = =0
δ 1000

P ( ¿5000 )=P ( Z <0 ) =0.5

The probability that an employee selected will have salary of less than
Php 5000 is 0.5 or 50 % .
x −μ 5750−5000 x −μ 6500−5000
b. z 1= 1 = =0.75 and z 2= 2 = =1.5
δ 1000 δ 1000

P(5750< X < 6500) ¿ P(0.75< Z< 1.5)


¿ P ( Z <1.5 )−P(Z <0.75)
¿ 0.9332−0.7734
¿ 0.1598 or 15.98 %
There is a 0.7066 or 70.66 % probability that the selected employee will
have salary between Php 5,750 and Php 6,500?
c. P( X >6600)
x−μ 6600−5000
z= = =1.6
δ 1000

P ( X >6600 )=P ( Z>1.6 )=1−P ( Z< 1.6 )=1−0.9452=0.548 or 5.48 % .

The chance that an employee selected will have a salary of more than
Php6,600 is 5.48%.

Solution by use of EXCEL:

Mathematics in the Modern World


25
Mathematics in Our World | Mathematics as a Tool: Data Management

DO THESE!
Show a complete solution for each problem.
1. Given a normal distribution with µ = 50 and δ = 10, find the probability that X
assumes a value between 45 and 62.
2. Given a normal distribution with µ = 300 and δ = 50, find the probability that X
assumes a value greater than 362.
3. In the qualifying examination for the admittance to college, the mean score was
65 and the standard deviation was 8. If 1,265 students took the qualifying exam,
how many of them scored between 60 and 75?
4. Records show that in a certain hospital the distribution of the “length of stay” of
its patients is normal with a mean of 10.5 days and a standard deviation of 2
days.
a. What percentage of the patients stayed 8 days?
b. What is the chance that a patient will stay in the hospital between 9 and 11
days?
5. An electrical firm manufactures light bulbs that have a length of life that is
normally distributed with mean equal to 800 hours and a standard deviation of 40
hours. Find the probability that a bulb burns between 778 and 834 hours.

CORRELATION AND REGRESSION


Several research studies focus on the relationships between two or more
things. For instance, a teacher may want to know if study habits of students may
relate to their performance in the classroom, a businessman needs to predict the
selling prizes of his products based on the monthly consumption demand, or an
agriculturist wants to know if the level of experience and practices of the farmers in
planting hybrid corn greatly affects their production. All of these things are involved
in the correlation and regression analysis of data.
These are useful tools in the analysis of data particularly in dealing with the
relationship between two variables and the prediction of data.

Definition: Correlation analysis is a method used measure the degree of


relationship or association between two or more variables.

Mathematics in the Modern World


26
Mathematics in Our World | Mathematics as a Tool: Data Management

The relationship between two variables can be shown graphically through a


pictorial representation described by the scatter diagram or scatter plot. The
relationship may be described by its magnitude or its strength. The correlation may
be positive, negative or no correlation at all.
Types of Correlation:
1. Positive correlation – a direct relationship between two variables exists.
That is, one variable increases (decreases) as the other
increases(decreases).
2. Negative correlation – an inverse relationship exists between the variables.
Here, one variable increases as the other decreases or vice versa.
3. Zero correlation – exists when scores in one variable tend to score neither
systematically high nor systematically low in the other variable. It indicates
that there is no correlation between the variables. The points in the scatter
diagram are in random manner.

If all points in the scatter diagram lie on a straight line, it is said to be a perfect
correlation. The degree or strength of relationship between two variables may be
computed using the correlation coefficient denoted by r. It is used primarily to
measure the degree of relationships between two variables that are linearly related.
Its value ranges from −1¿ 1 and it is computed using the formula
n ( Σ xy )−( Σx)(Σy)
r= 2 2
√ [ n ( Σ x ) −( Σ x ) ][ n ( Σ y ) −( Σy ) ]
2 2

where n is the sample size and X ,Y are the variables.


The correlation coefficient r may be interpreted descriptively depending
upon the computed value. Below are diagrams showing the common types of
correlation.

Mathematics in the Modern World


27
Mathematics in Our World | Mathematics as a Tool: Data Management

Diagram showing the positive, negative and zero correlation (Aufmann, et.al).

Example 1: A research study was conducted to determine the relationship between


students’ grade in English and their grades in Mathematics. Ten students in Math 01
class were randomly selected randomly and the results are as follows:

Student A B C D E F G H I J
9 8 8 9 9 8 7 8 8 7
English grade
3 9 4 1 0 3 5 1 4 4
Mathematics 9 8 8 8 8 8 7 7 8 7
grade 1 6 0 8 9 7 8 5 5 7

Solution:
Let x represents the English grade and y be the Mathematics grade.

Studen 2 2
x y x y xy
t
A 93 91 8649 8281 8463
B 89 86 7921 7396 7654
C 84 80 7056 6400 6720
D 91 88 8281 7744 8008
E 90 89 8100 7921 8010
F 83 87 6889 7569 7221
G 75 78 5625 6084 5850
H 81 75 6561 5625 6075
I 84 85 7056 7225 7140
J 74 77 5476 5929 5698
2 2
n=10 ∑ x=844 ∑ y=836 ∑ x =71614 ∑ y =70174 ∑ xy=70,839

Mathematics in the Modern World


28
Mathematics in Our World | Mathematics as a Tool: Data Management

Calculating for r gives


10 ( 70,839 ) −(844 )( 836) 2806
r= = =0.853
2 2
√ [ 10(71,614)− ( 844 ) ] [ 10 ( 70,174 )− ( 836 ) ] √(3804)(2844)
Since r=0.853 , there is a strong positive relationship between the student’s
grade in the two subjects.

Using EXCEL in computing the correlation coefficient, simply follow these steps:
1. Cick More Functions under Autosum, click CORREL then OK.
2. Click Array 1 then highlight all cell entries in X column.
3. Click Array 2 then highlight all cell entries in Y column.
This outcome will appear in the computer display.

4. Press OK. The value of the correlation coefficient will be displayed.

Mathematics in the Modern World


29
Mathematics in Our World | Mathematics as a Tool: Data Management

Aside from knowing the degree of relationship between two variables, it is


also possible to estimate the value of a variable based from the other. This process
is known as regression analysis. The relationship is estimated by fitting a straight
line through the given data. This line is known as the regression line which provides
a minimal error in estimation. It is given by the equation
y=a+bx
where y is the predicted value , b is the regression value (slope of the line)
which is calculated by the formula
n ( Σ xy )−(Σ x)(Σ y )
b=
n ( Σ x 2 )−( Σ x )2
and a is the y – intercept of the line which can be computed as
a= ý−b x́
where x́ is the mean of x – values and ý is the mean of y – values.

Example 2. Find the equation of the regression line in example 1. Predict the grade
of the student in English if his Mathematics grade is 93.
Solution: From example 1,
n=10,∑ x=844, ∑ y=836 , ∑ x 2=71614 and ∑ xy=70,839
The slope of the line is
10 ( 70, 839 )−(844)(836) 2806
b= = =0.74 .
10 ( 71,614 ) −(844 )2 3804
844 836
Note that x́= =84.4 and ý= =83.6 . So, the y-intercept of the line is
10 10
a= ý−b x́=83.6−0.74 ( 84.4 ) =21.144 .

Mathematics in the Modern World


30
Mathematics in Our World | Mathematics as a Tool: Data Management

Therefore, the regression line equation is y=a+bx → y=21.144 +0.74 x .


Thus, if the student’s grade in mathematics is x=93 , he’s English grade would be
y=21.144 +0.74 ( 93 )=89.964 or approximately 90.

Using EXCEL application to process the data, simply modify the function used in
example 1. Instead of choosing CORREL, change it to the needed function like
SLOPE or INTERCEPT.

DO THESE!
1. The grades of a class of 9 students on a midterm report (x) and on the final
examination (y) are as follows:
x 77 50 71 72 81 94 99 67
96
y 82 66 78 34 47 85 99 68
99
a. Find the equation of the regression line.
b. Estimate the final examination grade of a student who received a grade of
85 on the midterm report.

2. A study was made by a retail merchant to determine the relation between


weekly advertising expenditures and sales. The following data were recorded:
Advertising Sales in

Mathematics in the Modern World


31
Mathematics in Our World | Mathematics as a Tool: Data Management

cost pesos
(per Php1000) (per Php1000)
40 385
20 400
25 570
20 495
45 440
50 490
40 385
20 537
15 395
40 610
25 285
50 600

a. Plot a scatter diagram.


b. Determine the regression line equation to predict weekly sales from
advertising expenditures.
c. Estimate the weekly sales when the advertising costs are Php35,000.

References:
Aufmann, Richard N., Joanne S. Lockwood, Richard D. Nation, and Daniel K. Clegg.
2013. Mathematical Excursions. 3rd ed., Brooks/Cole Cengage Learning,USA.
Mathematics, A Practical Odyssey by Johnson & Mowry
Math in Our World by Sobecki, et. al.

Mathematics in the Modern World


32

You might also like