Data Management LESSON 1: Data Collection TIME FRAME: 2 Hours Code: Learning Outcome (S)
Data Management LESSON 1: Data Collection TIME FRAME: 2 Hours Code: Learning Outcome (S)
Data Management
LEARNING OUTCOME(S):
At the end of lesson, you are expected to:
1. Determine the sample size from a given population
2. Perform sampling from target population
3. Use the different data gathering techniques
In the conduct of the study, the population is defined in keeping with the objective of the
study. The population is the totality of the elements (person, objects, things and animals) in
which data is to be collected. If the population is large, collection of data can be costly and time
consuming. Hence, sampling is recommended. A sample is a representative of the population.
Example, you might take a survey of all the million senior high school students all over the
Philippines, but because Philippines is a large population, so you take a sample. That may be a
thousand senior high school students in the selected region in the Philippines.
A substantial number of a sample is acceptable. Sample that is too large may lead to waste
of money, time and resources, while a sample that is too small may lead to inaccurate results.
This inaccuracy is called as sampling error.
The data from the sample is used to make inference about the population. So, how large
should our sample be? How closely we want our results to match the entire population.
Finding a sample size can be one of the challenging tasks. It depends upon the factors
including the size of the population. There are many approaches to determine the sample size.
1. Conduct a census.
If the population is small, say, 𝑁 ≤ 200, conduct a survey to all the elements in the
population. A census eliminates the sampling error.
2. Use a sample size of a similar study.
Use the sample size of the study similar to the study you plan to work on. The
disadvantage of this method is the risk of repeating any errors that were made in
calculating the sample size.
3. Use a table.
A published or online table for sample size is available. It provide the sample size for a
given criteria. You can visit this website:
https://www.research-advisors.com/tools/SampleSize.htm
4. Use formula.
Many different formulas can be used depending on the available parameters.
If the population standard deviation is known, use Conchran’s Sample Size
formula.
If the population standard deviation is not known, use Slovin’s formula
The purpose of the survey is to infer about the population. In most cases, parameter,
i.e. standard deviation is impossible to obtain. Slovin’s formula is simpler and don’t need much
of the population parameters.
To use the formula, first figure out the margin of error “e”. A margin of error is an
interval estimate. In a confidence interval, it is the range of values below and above the sample
statistic.
For example, suppose we wanted to know the percentage of college students that using
facebook. We could devise a sample design to ensure that our sample estimate will not differ
from the true population value by more than, say, 5% (the margin of error) 95% percent of the
time (the confidence level).
If you know the standard deviation of the statistic, use the first equation to compute the
margin of error. Otherwise, use the second equation. While, these formulas are available, it is
customary to use a margin of error that is 5% or less. The smaller margin of error used, the larger
the sample size but leads to a smaller error.
𝑁
𝑛=
1 + 𝑁𝑒 2
Where
N = population size
n = sample size
e = margin of error
𝑁 10000
𝑛= 2
=
1 + 𝑁𝑒 1 + 10000 (0.05)2
10000
𝑛= = 385
1 + 25
Do worksheet 4A
Sampling Techniques
After computing for the sample size, our next concern is how to select the samples from
the population. The sample reflects the characteristics of the population from which it is drawn.
Sampling is an act, process, or technique of selecting appropriate sample, or representative of
the population.
We use sampling rather than complete enumeration (census) because it is convenient
and cheaper.
Sampling techniques are classified as either probability or nonprobability.
Probability Sampling: Samples are randomly chosen. Each member of the population has an
equal chance to be selected.
Nonprobability sampling: Personal judgment plays a very important role in the selection. Each
member of the population does not have a known chance of being included in the sample.
Simple random sampling: Each member of the population has an equal chance of being included
in the sample. It can use the table of random numbers and lottery or fish-bowl method.
Example, suppose the store owner wants to evaluate the performance of his staff. He
writes all the names of his staff in pieces of paper, and then draws 20 members who will be part
of the survey.
When the population is large, this technique is not recommended, since, it is often
difficult to identify the sample.
Systematic Sampling: It selects every kth member of the population with starting point
determine at random. Where k = N/n.
Example, N = 100, n = 20, so k = 5. So, select every 5th member in the list with starting
point determine at random.
Stratified Random sampling: This is used if the population can be subdivided into strata. The
samples can be randomly selected from each stratum.
Example, a survey to find out if families living in a certain municipality with a population
of 5000 are in favor of excise tax. Suppose, you want to sample 370 families and ensure that all
income groups are well represented, respondents will be divided into income groups as shown
below;
This method of calculation of sample size from each stratum is called as proportional
allocation.
Cluster sampling: This is usually used if the population is very large. It is sometimes called as area
sampling. The members of the population are divided into groups or clusters.
Example, we want to determine the average daily water consumption of families living in
Malaybalay City. There are 46 barangays in Malaybalay City. We can draw a random sample of
10 barangays using simple random sampling and then a certain number families from each of the
10 barangays are chosen.
Types of Nonprobability Sampling
Convenience sampling: Samples are chosen because of their availability. It also offers
convenience to the researcher.
Example, suppose you want to know the average income of the Filipino. You select the
sample mobile phones.
Purposive sampling: It is selecting the sample on the basis of pre-determined criteria set by the
researcher. The sample is selected based on the objective or purpose of the study.
Example, the research is about the life of teenager who are already a mother. Of course,
only teenage parents will be the respondents.
Quota Sampling: This is the equivalent of stratified random sampling. The only difference is that
the selection of the samples in each stratum is not random.
Example, a survey is conducted to determine the most popular noon time show, each
field researcher is assigned to a certain area and given a quota of 300 viewers.
Snowball or Network Sampling: This is used if the desired sample is difficult to find or locate.
This relies to a referral, that is why, it is sometimes known as referral sampling.
Example, if one is interested with drug users, alcoholics or HIV positive. The researcher
asks referrals from the participants to locate potential samples.
Do worksheet 4B
WORKSHEET 4A
Name:______________________________ Date:_____________
Class Schedule:_______________________ Instructor:_________
5. Does the number of sample help in reducing errors in the study? Why?
Name:______________________________ Date:_____________
Class Schedule:_______________________ Instructor:_________
Plan a study to be conducted. Select one (1) topic from the following:
a. Average family size among Motorela drivers.
b. Monthly allowance and expenditures of students in BukSU.
c. High School GPA of freshmen students in BukSU.
d. Average hour daily spent in facebook by students.
e. Opinion of students about same sex marriage.
Topic : _______________________________________________________________
Title : ____________________________________________________________________
Discuss who will be your sample and the technique you are planning to use:
Data Gathering Techniques
Characteristics measured from the person, object or thing is called as variable. The values
of the observations under a specific variable is called as data. Data can be classified as qualitative
and quantitative. Qualitative data are expressed in non-numeric such as categories, kinds, brands
or names. Quantitative data are expressed in numbers.
There are many ways in collecting data, here are the following:
Direct method
Observation – the researcher sees the situation directly to gather data. Researcher
may use video tape or audio tape.
Interview – in-person or by telephone.
Indirect method
Questionnaire – a printed list of questions is used with or without the presence of the
researcher. It can be mailed or handed personally.
Registration method
This method gathers data from legal documents imposed by law. Data are sometimes
obtained in published or unpublished document. Age, sex and other information can be found in
Philippine Statistics Authority (PSA), number of registered cars can be obtained from Land
Transportation Office (LTO), etc.
Statistical Instruments
Types of questionnaire
Structured or closed format. The questions requires one answer only.
Examples:
1. Are you in favor of k-12 curriculum?
___ Yes __ No
Unstructured or open format. The questions can have different answers. There are no limits
as to the responses of respondents.
Examples:
1. What do you think are the reasons of bullying?
2. Why did you choose this course?
Rating Scale. It is often used to ask respondents to rate. One common example is the likert-
scale.
Examples:
1. The quality of education nowadays is improving.
__Strongly disagree __ Disagree __Agree __Strongly Agree
Form a group of four (4). Each group will submit a title from the selected topic listed in
Worksheet 4B due on _________________.
Instructions:
1. Write a brief rationale for choosing the topic.
Why did you choose the topic?
What is its importance?
Who will benefit from it?
2. Identify the target population and its actual number.
3. Compute the number of sample.
4. Identify the sampling and data gathering techniques.
LESSON 2: Measure of Central tendency
LEARNING OUTCOME(S):
At the end of lesson, you are expected to:
1. Summarize and Describe the data using Measure of Central Tendency
2. Determine when Mean, Median or Mode is appropriate to use in a given data
3. Compute descriptive measures using Excel
4. Write findings and draw conclusions using descriptive measures
Giving a numerical values to the important features of a person is one way of describing its
physical appearance, such as weight, age or body statistics.
Data collected in a study can be organize by tables and graphs. It can also be summarize by a
single value, such as the descriptive measures.
The most common descriptive measures are the Measure of Central Tendency and Measure
of Dispersion.
The Measure of central Tendency or averages are measurements that tells us where the
middle of data lies. This includes the Mean, Median and Mode.
The Mean
The arithmetic mean, or simply the mean, is defined as the sum of the set of the
observations that is divided by the number of that particular set of observations. The biggest
disadvantage of this descriptive measure is that it can’t be used in the case of qualitative data
and it is also affected by extreme values.
∑𝑋 ∑𝑋
𝜇= (population mean) 𝑥̅ = (sample mean)
𝑁 𝑛
Example 1: Consider the ages of the respondents. Solve for the mean.
8, 9, 11, 5, 12, 17, 7, 23, 39, 15
Solution:
8 + 9 + 11 + 5 + 12 + 17 + 7 + 23 + 39 + 15
𝑥̅ = = 14.6
10
Findings:
The oldest respondent is 39 years old, while the youngest is 7 years old.
The average age of the respondents is 14.6 years old.
Can you give another findings?
Example 2: The students were asked about their weekly allowance (in pesos).
500 250 750 1000 800 750 700 500 500 1000
500 + 250 + 750 + 1000 + 800 + 750 + 700 + 500 + 500 + 1000
𝑥̅ = = 675
10
Example 3: Another group of students were asked about their weekly allowance (in
pesos).
500 250 750 1000 800 750 700 500 50,000 10,000
500 + 250 + 750 + 1000 + 800 + 750 + 700 + 500 + 50000 + 10000
𝑥̅ = = 6525
10
Challenge questions:
1. In example 2, does the average weekly allowance of the students which is 675 pesos
describe or represent the allowance of students?
2. In example 3, does the average weekly allowance of the students which is 6525 pesos
describe or represent the allowance of students?
3. Does the averages in the examples lies in the middle of the data set?
4. What can you say about the results?
The Weighted Mean
Consider the proper weights assigned to the observed values according to their
relative importance.
Example:
Find the mean grade point average (GPA) of Juan, presented in a frequency table
below.
Statistics 4.0 3
English 2.0 3
Physics 1.5 5
P.E. 1.0 2
The Median
𝑥+1
If n is odd 𝑀𝑒𝑑𝑖𝑎𝑛 = where x is the total number of observation
2
𝑥 𝑛 +𝑥 𝑛
( ) (( )+1)
2 2
If n is even 𝑀𝑒𝑑𝑖𝑎𝑛 = 2
Example 1: The students were asked about their weekly allowance (in pesos).
500 250 750 1000 800 750 700 500 50,000 10,000
250 500 500 700 750 750 800 1000 10,000 50,000
Median is in the middle value of the set of data, so, if n= 10/2 = 5. The median is located
in the average of the 5th and 6th data.
750 + 750
median = = 750
2
The average weekly allowance of the students is 750 pesos.
English Science Science Math English Math Logic English Filipino Filipino
Challenge questions:
1. What is the most liked subject by the students?
2. What can you used the mean? The median? When can we use median?
3. How are we going to summarize the data?
The Mode
It is the value in the set that occurs most often. A set of data can be bimodal, trimodal
or multi modal. It is also possible to have a set of data with no mode.
Properties of Mode
1. It is the quick approximation of the average.
2. It is the most unreliable among the three measures of central tendency because its
value is undefined in some observations
3. It exists in both quantitative and qualitative data.
Solution:
Do worksheet 4C
WORKSHEET 4B
Name:______________________________ Date:_____________
Class Schedule:_______________________ Instructor:_________
A. Two groups of students were asked to encode a story, then the time (in minutes) they
spent in encoding were monitored, as follows:
A 8 8 9 10 12 10 9 10 10
B 11 8 12 12 7 9 10 9 8
B. Mark, a scholar in a certain University receives his grade for the second semester.
Subject Unit Grade
Statistics 3 2.75
English 3 1.75
Chemistry 5 2.0
Rizal Life and Work 3 1.25
Geometry 3 3.0
LEARNING OUTCOME(S):
At the end of lesson, you are expected to:
1. Summarize and Describe the data using Measure of Dispersion
2. Solve the different measure of dispersion
3. Compute measure of dispersion using Excel
4. Write findings and draw conclusions using Measure of Dispersion.
In the previous lesson, we have learned about the measurement which lies in the center of
the data set. In this topic, we will continue to learn about descriptive measure but this is focus
on how spread out or varied are the data from the center of the distribution.
Consider the two (2) data sets:
A: 100 65 75 85 95 Mean = 84
B: 84 86 85 82 83 Mean = 84
The average grade of both class is 84. We can say, that the two classes have equal
performance. But the average does not tell us how spread out the scores are. Figure 1, shows
how spread out the data are.
A
B
65 70 75 80 85 90 95 100
Data
In this lesson, we are going to consider only three measures, namely: The Range and The
Standard deviation.
The Range
It is defined as the difference between the largest score in the set of data and the
smallest score in the set of data, Highest score – lowest score
The Standard Deviation is a measure of how spread out numbers is. It is the square root
of the variance.
∑(𝑥 − 𝑥̅ )2
𝑠=√
𝑛−1
∑(𝑥 − 𝜇)2
𝜎=√
𝑁
where,
= population standard deviation
S = sample standard deviation
x= data
= population mean
x̅ = sample mean
n = number of scores in sample.
N = population size
A: 100 65 75 85 95 Mean = 84
B: 84 86 85 82 83 Mean = 84
For class A:
X 𝑥̅ (𝑥 − 𝑥̅ ) (𝑥 − 𝑥̅ )2
100 84 16 256
65 84 -19 361
75 84 -9 81
85 84 1 1
95 84 11 121
∑(𝑥 − 𝑥̅ )2
= 820
For class B:
X 𝑥̅ (𝑥 − 𝑥̅ ) (𝑥 − 𝑥̅ )2
84 84 0 0
86 84 2 4
85 84 1 1
82 84 -2 4
83 84 -1 1
∑(𝑥 − 𝑥̅ )2
= 10
For class A:
∑(𝑥 − 𝑥̅ )2 820
𝑠=√ =√ = 14.32
𝑛−1 4
For class B:
∑(𝑥 − 𝑥̅ )2 10
𝑠=√ = √ = 1.58
𝑛−1 4
B 84 4 1.58
Notice that computation is manageable if there are few cases. If there is a bulk of data, a
calculator or a computer is necessary. Below is a guide in using excel in the computation of the
standard deviation.
Excel Application
Using the Excel, find the descriptive statistics of the monthly income of 10 employees.
Encode the data in excel Click File, then Option click Add-ins,
then Go
Check Analysis ToolPak, then OK click Data in the toolbar,you can see the Data
Analysis in the upper right portion of the
worksheet
Click the Data Analysis, a box (data Analysis) will pop-up. Click Descriptive Statistics.
Descriptive Statistics box will appear, then highlight the data in the column for the input range.
Check the summary statistics.
Mean 32360
Standard Error 3399.941176
Median 29500
Mode 45000
Standard Deviation 10751.55803
Sample Variance 115596000
Kurtosis -1.323590825
Skewness 0.425396135
Range 29900
Minimum 19100
Maximum 49000
Sum 323600
Count 10
Name:______________________________ Date:_____________
Class Schedule:_______________________ Instructor:_________
Ten (10) employees of 4L Company were in a business trip. The following is the summary of
their expenses.
1. What set of data is more variable? The transportation? Or Board & Lodging? Why?
3. In measuring the variability of data, which do you prefer? Range or standard deviation?
OUTPUT OF THE WEEK
Continue the planned study from the previous output with the existing grouping and do the
following:
1. Formulate statements of the problem.
2. Draft your instrument
3. Conduct a survey using the approved instrument.
4. Tabulate the data gathered in excel.
5. Organize, summary and present the data using the different descriptive measures.
6. Draw your findings and analysis based on the result.
Exact number
Exact number of Exact number
Exact number of of target
target population of target
target population population is
is not stated; population is
Target is stated; number stated;
number of stated; number
Population of samples is number of
samples is of samples is
incorrectly samples is
incorrectly incorrectly
computed correctly
computed computed
computed
Sampling and
Sampling and Sampling and
data gathering
Sampling and data gathering data gathering
techniques are
Sampling and data gathering techniques are techniques are
appropriate to
data gathering techniques are appropriate to appropriate to
the topic and
techniques not appropriate the topic but the topic and
explained but
to the topic not clearly are clearly
with few
explained explained
inaccuracies
RUBRIC FOR THE SECOND WEEK OUTPUT
Conclusion- Presents an
Presents an illogical Presents a logical Presents a logical
findings and illogical
explanation for explanation for explanation for
Analysis explanation for
findings and is findings and is findings and is
findings but is
(30%) incomplete. incomplete. complete.
complete.