0% found this document useful (0 votes)
10 views

Chapter 4_mathematics as a Tool

The document outlines the principles and methods of data management, emphasizing the importance of accurate data gathering, organization, and presentation. It details various data collection methods, scales of measurement, and statistical tools for data analysis, including measures of central tendency and dispersion. Additionally, it discusses the significance of using statistical data for informed decision-making.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Chapter 4_mathematics as a Tool

The document outlines the principles and methods of data management, emphasizing the importance of accurate data gathering, organization, and presentation. It details various data collection methods, scales of measurement, and statistical tools for data analysis, including measures of central tendency and dispersion. Additionally, it discusses the significance of using statistical data for informed decision-making.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 133

MATHEMATICS

AS A TOOL
DATA MANAGEMENT
Is the development, execution and
supervision of plans, policies, programs and
practices that control, protect, deliver and
enhance the value of data and information
assets.
It is an administrative process by which the
data is acquired, validated, stored,
protected and processed.
objectives
1.Use a variety of statistical tools to process and manage numerical
data;

2. Use the methods of linear regression and correlations to predict the


the value of a variable given certain conditions; and

3. Advocate the use of statistical data in making important decisions.


A. Gathering, Organizing,
Representing and Interpreting Data
An investigation should always based on accurate data which
requires good management. Correct methods of collecting data,
right way of organizing them and good data presentation will
result to a precise analysis and interpretation.
• All the various steps of data processing should be planned
when the study is designed, before any data are collected. All
forms used for recording data should be carefully designed and
tested to ensure that the data can easily be extracted for
processing
DATA GATHERING
There are different methods used in gathering or collecting data
These are:

Direct or Interview Method


A person-to-person encounter between the source of
information, the interviewer, and the one who gathers
information, the interviewer. Interview can be done
personal, through phone or internet access.
Indirect or Questionnaire
Method
The technique in which a questionnaire is
used to elicit the information or data
needed.
REGISTRATION METHOD
Obstains data from the records of
government
agency authorized by law to keep such data
or information and made these available to
researchers.
OBSERVATION METHOD
Is the technique in which data particularly
those pertaining to the behaviors of
individuals or group of individuals during the
given situation are the best obtained through
observations.
EXPERIMENTAL METHOD
Is a system used to gather data from the
results of performed series of experiments
on some controlled and experimental
variables.This is commonly used in
scientific inquiries.
DATA ORGANIZATION
Data collected or obtained from whatever
manner are called raw data. Data collected can
be classified according to the scale of
measurement used. There are four levels of
measurements, from lowest to highest scale: the
nominal, ordinal, interval and ratio scales.
Nominal Scale
assigns names or labels to observation in purely arbitrary
sequence. The labels are used to classify the respondents or
objects without ordering. For instance, if we need to classify the
respondents' preferences on cellphone brands, such as apple,
samsung, lg, oneplus, vivo, oppo, etc. we measure on the nominal
scale and data gathered is a nominal data.
MORE EXAMPLES:
Gender Favorite Colors: Blood Types:
Male Blue, Red, Green, Α, Β, ΑΒ, Ο
Female Yellow
Ordinal Scale
assigns numbers or labels to observations with implied
ordering. Ranking the respondents preferences means
measuring responses in the ordinal scale and the data
obtained is called ordinal data.
EXAMPLES:
Rankings in a Competition
Satisfaction Survey

very Unsatisfied Unsatisfied Neutral Satisfied Very satisfied


Interval Scale
are real numbers used to reflect distance between rank of the
respondens or objects in equal units.
It gives the distance between any two numbers of known sizes,
has a zero point and has a unit of measurement
The data collected can be manipulated by addition or
subtraction but not division or multiplication.
EXAMPLES:
Expenses Distance Weight
Ratio Scale
this numbers are used to reflect the existence of true absolute
zero point as its origin.
the ratio of two scale point is independent own the unit measured.
the data collected has all the properties of an interval data and
can be manipulated by multiplication and division.
MORE EXAMPLES:
Data on birth rate Unemployment rate
PRESENTATION OF DATA
- A systematic arrangement of the data in a
tabular form is called tabulation or
presentation of data.
 ata may presented in ( 3 Methods ) :
D
- Textual Form
-Tabular Form
- Graphical Form
TEXTUAL FORM
Textual presentation use words,
statements or paragraphs with
numerals, numbers to describe
data
Tabular form
Is a systematic presentation of data in rows and columns.
It is used when numerical facts need to be classified in
arrays.

Graphical form
Shows numerical values or relationship in pictorial
form. It makes use of graphs, symbols or visual aids.
how is tabular presentation should be presented?

Heading should show the table number, title, and head


note.
It also must have box head, stub, footnote, and source
note.
A good chart should possess the following properties:

Accurate - It should not be deceptive, distorted, or misleading or in any way susceptible to


wrong interpretation as a result of inaccurate or careless construction.

Simple - It should be straight forward and not loaded with irrelevant, or trivia symbols and
ornamentation.

Clear - It should be easy to read and understood. There should be truthful and unambiguous
representation of facts.

Attractive - It should be stylish to attract and hold the attention by holding a neat, dignified,
and professional appearance.
There are various graphs we can use to
present data, these are:

Line graph - They are ideal for visualizing how a Bar graph - It consist of regular bars where
variable changes over a period of time. the height of bars represents quantity of
frequency for each category

Figure 1. BS Applied Statistics Figure 2. BS Applied Statistics


Graduates (2012 - 2017) Graduates (2012 - 2017)
Growth Pattern of Philippine Population: 1960 - 2010

Year

2010

2005

2000

1995

1990

1985

1980

1975

1970

1965

1960

Figure 3. BS Applied Statistics


Note: Based on Series 2: Moderate Fertility and Mortality Decline Population Projection
*Censual Year
- 10 million

Graduates (2012 - 2017)

Pie graph - Is used to show percentage or Pictograph or pictogram is used to


the composition by parts of a whole. immediately suggest the nature of data.
ORGANIZING COLLECTED
NUMERICAL DATA CAN BE
DONE IN TWO DAY
ARRAY:
Is an arrangement of the numerical data/
values according to order of magnitude either
ascending or descending
frequency distribution table
it is categories the numerical data into
intervals or classes.
STEPS IN CONSTRUCTING A
FREUENCY DISTRIBUTION
WITH EQUAL CLASS SIZE
1. DETERMINE THE RANGE R OF THE
NUMERICAL DATA.

R = ┃HIGHEST VALUE - LOWEST VALUE ┃


2.determine the number of classes k to which the
data are the data are to be group using the sturges
approximation

K = 1 + 3.322 Log N
where n = total of values to be grouped
3. determine the class size c

C=R/K
4. determine the lower limit
of the first class
5. Construct the class intervals
and determine the class
frequencies
FREQUENCY HISTOGRAM
A set of vertical bars whose areas are
proportional to the frequencies presented
FREQUENCY POLYGON
Is a line chart plotted along the same scale as
the histogram. The class frequency is plotted
against the class mark.
data analysis and
interpretation
data analysis and
interpretation
the measure of tendency measures
of dispersion and measure of
skewness and kurtosis
estimation of parameters (s) and
the hypothesis testing
measure of central
tendency
are measures indicating the
center of set of data which
are arranged in order of
magnitude
mean or arithmetic mean
(or average)
The most popular and well
known measure of a central
tendency.
mean or arithmetic mean
(or average)

=
weighted mean
Properties of mean
The sum of the deviations of the
observations from the mean is
zero. The mean is devoted by
WHAT IS

M e d i a n ??
Median
The median is defined as the middle value when a
set of observed values have been arranged in either
ascending (from lowest to highest value) or
descending (from highest to lowest value) order of
magnitude.
e x a m p l e:
Median
• The median is the centermost array into two equal
parts, that is 50% of the total number of
observation is less than the median value while the
other 50% is greater than the median value.

• The median is denoted by Md.


e x a m p l e:

Md
properties of median
• The median is not affected by all of the data
values in a dataset.

• Individual values do not reflect the median value,


which is determined by the position of the item.

• The distance between the median and the


remaining values is less than any other point's
distance. In each array, there is only one median.
WHAT IS

m o d e ??
mode
• The most frequent score in the data set. It is
sometimes considered as the most popular option.

• The Mode is a value which occurs most often or


the most frequently occurring observation.

• The mode is denoted by Mo.


example:

1, 2, 2, 2, 8, 1, 4, 10.
The most frequently occurring observation is 2
which appeared thrice. Thus, the mode is 2, and
since there is only one mode, then the distribution is
unimodal.
example:
Suppose BS Applied Statistics has 10 students and
the height (in cm) are as follows: 170, 165, 155, 160,
150, 149, 152, 161, 163, 175.

Since all values occur with equal frequency, then


this data has no mode.
Result of the survey of the color of cars owned
by faculty shows that 40 were white, 20 blue
and 10 were red.

what is the modal color of cars


owned by faculty ??
PROPERTIES OF MODE
• A mode is defined as the value that has a higher
frequency in a given set of values.

• It is the value that appears the most number of


times.
C. MEASURES OF DISPERSION
- Measures of dispersion identify how a set of values
spreads or fluctuates.

- The measures of dispersion are the range, absolute


deviation or variance, the standard deviation, the coefficient
of variation, the coefficient of skewness and the boxplot.
RANGE
• Range is the simplest measure of dispersion. It is
the different between the highest and lowest score.
PROPERTIES OF RANGE
• It is quick but rough measure of dispersion.
• The larger the value of the range, the more
dispersed are the observation.
• It considers only the lowest and the highest value
in the data set.
MEAN ABSOLUTE DEVIATION
also known as VARIANCE, is the simplest method of
taking into account the variations or the spread
ability of all items into a series from the point of
central tendency.

The variance considers the position of each


observation relative to the mean.
The variance of a given data set is the average of
the sum of the square deviation of the observation
from the mean. The variance from the population is
denoted by σ²(read as "sigma square") and s² (read
as "s-square") for the sample.

For Ungrouped data:


Given the set of values X₁, X₂, X₃ ,...,Xɴ The deviation
of observation from the mean is X₁- μ.
The definitional formula of the population variance, is:

The computational formula of ofthe


The computational formula population
the population variance is variance is
The computational formula of the population variance is
The computational formula of the population variance is
Definitional Formula of sample variance is:
The computational formula of the population variance is

— — — — —

The computational formula of the population variance is

——
The computational formula of the sample variance is,
The computational formula of the population variance is

The computational formula of the population variance is


The computational formula of the population variance is
The computational formula of the population variance is
For Grouped data: The variance from the group data
The computational formula of the sample variance is,The computational formula of the population variance is
can be obtained using the formula.

The computational formula of the population variance is


Sample Variance formula for Grouped data;

The computational formula of the population variance is

The computational formula of the population variance is


The computational formula of the population variance is
The computational formula of the population variance is
PROPERTIES OF THE VARIANCE

1. The variance is always nonnegative;

2. The larger the value of the variance the more dispersed are the
observations;
The computational formula of the population variance is
3. The variance can be easily manipulated;

4. Each observation contributes to the magnitude of the variance;

5. The unit of measure of the variance is the square of the unit of


measure the original data set.
Standard Deviation
Based on the deviations of all the scores in a series. It is
always computed from the mean. The standard deviation
is defined as the positive square root of the variance.

Properties of
Properties of Standard
Standard Deviation
Deviation
The properties of standard deviation have the same
properties with the variance.
4. coefficient of Variation

- Also known as relative


dispersion, is the ratio of the
standard devaition and the
mean and is usually
expressed in percent
5. Skewness

Is a measure or a creation
on how asymmetric the
distribution of data is from
the mean.
5. Skewness
Using Measures of Central Tendency
• If Mean = Median = Mode, the skewness is zero. (Symmetrical)
• If Mean > Median > Mode, the skewness is positive.
• If Mean < Median < Mode, the skewness is negative.
The formula for the coefficient of the
Pearsonian skewness, denoted by SK, is

where
- Pearsonian Coefficient of Skewness
- the mean
- the median
- the standard deviation

If SK = 0 then the distribution is symmetric


SK > 0 then the distribution is positively skewed
SK < 0 then the distribution is negatively skewed
Example:
Given a random sample of size n = 10,
4, 7, 8, 2, 8, 8, 9, 2, 5, 7

using the measure of central tendency, tell whether the given data
are symmetric, skewed to the left, or skewed to the right.
Mean = 6 Median = 7 Mode = 8
Since the Mean < Median < Mode, therefore it is
negatively skewed
Example:
The following data represent the score of 7BS Applied Statistics in a quiz:
X = 4, X = 7, X = 8, X = 2, X = 2, X = 2, X = 9, X = 3

Compute the coefficient of skewness

Solution: Md = 4 =5 = 2.73

3(5 - 4)
SK = = 1.0989 = 1.10
2.73
Hence, positively skewed distribution
6. Coefficient of Kurtosis
Kurtosis measures the flatness and
peakedness of the distribution of a
given data set. It also measures the
degree of departure from the
normal distribution
6. Coefficient of Kurtosis

Positive Kurtosis

Leptokurtic

Negative Distribution
Normal Distribution
Platykurtic Mesokurtic
The coefficient of The coefficient of sample
population kurtosis is kurtosis is denoted by K
denoted by K and is given and is given by
by

4 4
Xi Xi X

K= K=
2 2 2 2
( ) ( )
and the coefficient of kurtosis for group denoted
by K 6 is given by
44
Xi G )

K=
2 2
( )
If K < 3, then the distribution is Platykurtic
K > 3, then the distribution is Leptokurtic
K = 3, then the distribution is Mesokurtic
Example:
The following data represent the score of 7BS Applied Statistics in a quiz:
X = 4, X = 7, X = 8, X = 2, X = 2, X = 9, X = 3

Compute the coefficient of kurtosis

Solution:
= (4-5)⁴+ (7-5)⁴+ (8-5)⁴+ (2-5)⁴+ (2-5)⁴+ (9-5)⁴+ (3-5)⁴= 532
532 / 7
K= = 0.04115 = 0.04
(2.73²)²

Hence, it is a platykurtic distribution


d. measures of
relative position
Measures of position identifies
the rank or position occupied
by a data from an array of
data collected.
1. P E R C E N T I L E S

Refers to values that divide a set of observations


into 100 equal parts.

These values denoted by P1, P2, P3, ..., P99,


which means that 1% of the data falls below P1,
2% in P2, 3% in P3 and so on.
fORMULA:
2. d e c i l e s

Refers to values that divide a set of observations


into 10 equal parts.

These values denoted by D1, D2, D3, ..., D99,


which means that 10% of the data falls below
D1, 20% in D2, 30% in D3 and so on.
F O R M U L A:
3. Q U A R T I L E S
The 1st Quartile, Q1, also called the lower
quartile is equivalent to P25
The 2nd Quartile, Q2, is the middlemost score or
the median and is equivalent to the 50th
percentile (P50)
The 3rd Quartile, Q3, also called the upper
quartile is equivalent to P75 or 75th
percentile
FOR GROUPED DATA:
Probabilities and Normal
Distributions
Counting Principle

If one event can occur in n₁ different ways


and if for each of these occurrences, a
2nd event can occur in n₂ different ways,
then, the total number of ways in which
two events may occur is n₁n₂ ways .
Example
Example
Permutation

An arrangement of objects in a definite


order is called a permutation of the
objects.
Combination

Combination is a set or collection of


objects in no particular order.
Example
Statistical Experiment

A statistical experiment is any process


that generates data.
Example
Sample Space

A sample space is the set of all possible


outcomes of a given statistical
experiment.
Example
Probability of Event

Event- An event is a subset of sample


space.
If an experiment can result in any one of N
different equally likely outcomes, and if
exactly n of these outcomes corresponds
to event A.
Example
Probability Distribution
The computational formula of the population variance is

Probability Distributions - populations are characterized in


terms of mathematical models. For continuous variables, the
probability distribution is also known as density function.
The computational formula of the population variance is

Discrete sample a sample space contains a finite number of


possibilities with as many elements as there are whole
numbers
Probability Distribution
The computational formula of the population variance is

A random variable defined over a discrete sample space is


called a discrete random variable.

If a sample space contains an infinite number of possibilities


The computational formula of the population variance is
equal to the number of points on a line segment, it is called a
continuous sample space. A random variable defined over a
continuous sample space is called a continuous random
variable.
Normal Distribution
The computational formula of the population variance is

A continuous random variable X is said to be normally distributed if its density function is


1

f(x) = σπ
The computational formula of the population variance is

=e

for - ∞ < x < ∞ and for constants μ and σ, where - ∞ < μ < ∞, σ > 0 and e≈2.71828
and π≈3.14159.
Standard Normal Distribution
The computational formula of the population variance is

The values of normal random variable X for


the normal distribution are usually in terms
of how many standard deviations they are
away from the mean.
LINEAR REGRESSION AND
CORRELATION
The computational formula of the population variance is

- A correlation is a relationship or
association between two variables.
correlation coefficient
- The linear correlation coefficient is a
number calculated from given data
that measures the strength of the
The computational formula of th

linear relationship between two population variance is

variables: x and y.
PEARSON PRODUCT MOMENT
COEFFICIENT
- Is a measure of the linear relationship
The computational formula of the population variance is

between two variables that have been


measured on interval ratio scales.
The computational formula of the population variance is
Spearman rank
correlation Coefficient
- Is the best known measure of
relationship between two variables The computational formula of th
population variance is

based on ranks (ordinal scale)


It's formula is given by;

The computational formula of the population variance is

Where di is the difference between the ith paired ranks,and


n is the total number of paired measurements
Phi Coefficient
Is a measure of association based on the
chi- square statistics.It is applicable only The computational formula of th
population variance is

to a 2x2 contingency table where the


variables are genuinely dichotomous
It's formula is given by;

The computational formula of the population variance is

Where the X² is the chi- square statistics and n is the


total frequency
For Computation purposes, it is desirable to
covert the phi coefficient formula in the form of;

The computational formula of the population variance is


The schematic 2 x 2 table is given

The computational formula of the sample variance is,The computational formula of the population variance is
Contingency Coefficient
The contingency coefficient
is the most and oldest
measure of association
based on the chi-square
statistics.
Its formula is given by where X2 the
chi-square statistics n - total frequency


C=
x²+n
where X² the chi-square statistics n - total frequency
Cramer's V Coefficient
The Cramer's coefficient is a
measure of the degree of
association or relationship
between two sets of nominal
data.
Its formula was based from the chi-square
statistics and is given by


C=
n (L-1)
where X² the chi-square statistics, n - total frequency
and L is the minimum of number of rows and columns
Example:
A study was conducted whether there is significant association
between highest educational attainment of father/mother and the
number of siblings. A sample of 525 households were randomly
selected with the following results:
Example:
Testing for the Significance of Cramer's V Coefficient
1. Ho: there is no significant association between educational
attainment and number of children
Ha: there is a significant association between educational
attainment and number of children
2. Level of significance a= 0.05 and sample size n= 525
3. Test Statistics : Chi-square test
4. Critical Region : Reject Ho if X²c > 9.488
Example:
5. Computations : Compute the Chi-square test statistic using the
formula X²c = {(50-64)2/64} + {(25-39)2/39} + {(60-31)2/31}
+ {(70 -95)2/95} + {(88-58)2/58} + {(42-46)2/46}
+ {(138-90)2/90} + {(40-55)2/55} + {(20-44)2/44}

X²c = 91.54

6. Decision : Since X²c < 9.488, therefore reject Ho and conclude that
there is a significant association bwtween educational attainment
and number of children in the household
The degree of association is computed
using the formula of contingency
coefficient is


C=
x²+n
C = 0.385329 = 0.385
The degree of association is computed
using the formula of Cramer's V Coefficient
is


C=
n (L-1)
C = 0.29526401 = 0.295
THANK
YOU!

You might also like