INTRODUCTION-Data-Mgt
INTRODUCTION-Data-Mgt
DEFINITION OF TERMS
Statistics
It is the art of science of collecting, presenting, analyzing
and interpreting data (Broto, 2006)
The science that deals with the collection, tabulation or
presentation, analysis and interpretation of numerical or
quantitative data(Pagoso, 1997)
This refers to the techniques by which quantitative data are
collected, presented, organized, analyzed and interpreted.
The focal point of modern statistical analysis is decision
making.
Descriptive Statistics
This includes the techniques which are
concerned with summarizing and describing
numerical data. This method can either be
graphical or computational. It is used to
present and analyze information in a
convenient, usable and understandable for
data management and other purposes
These includes:
Parametric test
z-test
t-test
pearson product moment of correlation analysis
regression analysis
ANOVA, ANCOVA, MANOVA
Non-parametric test
chi-square test
Wilcoxon Rank-Sum Test
Kruskal-Wallis Test
Spearman Rank Order Coefficient of
Correlation rs
Sign Test
Mc Nemar’s Test
Friedman Fr Test
Kendall’s Coefficient of Concordance
Population & Sample
Population - It is the totality of all actual or concern able objects of a
certain class under consideration. It is a complete set of individuals,
objects or measurements having some common observable
characteristics.
Ungrouped data
The data which have not been organized or classified and usually exhibit
no pattern.
Tabulation - the process of grouping or classifying data for purposes of
analysis and interpretation.
https://cdn.ymaws.com/www.safestates.org/resource/resmgr/
connections_lab/glossary_citation/Primary__Secondary_Data_
Defi.pdf
https://researchguides.ben.edu/c.php?g=282050&p=7037027
Surveys, observations, experiments,
questionnaire, personal interview, etc.
Government publications, websites,
books, journal articles, internal records
etc
Primary Data Sources
Primary data analysis in which the same individual or team of
researchers designs, collects, and analyzes the data, for the purpose of
answering a research question (Koziol & Arthur, nd)
Advantages to Using Primary
Data
•You collect exactly the data elements that you need to answer your
research question (Romano).
•You can test an intervention, such as an experimental drug or an
educational program, in the purest way (a double-blind randomized
controlled trial (Romano).
•You control the data collection process, so you can ensure data quality,
minimize the number of missing values, and assess the reliability of
your instruments (Romano).
Secondary Data Sources
Existing data collected for another purposes, that you use to answer
your research question (Romano).
Advantages of Working with
Secondary Data
•Large samplesCan provide population estimates : for example state data
can be combined across states to get national estimates (Shaheen, Pan,
& Mukherjee).
•Less expensive to collect than primary data (Romano)
•It takes less time to collect secondary data (Romano).
•You may not need to worry about informed consent, human subjects
restriction (Romano).
Issues in Using Secondary Data
Study
•Design and data collection already completed (Koziol & Arthur, nd).
•Data may not facilitate particular research question o Information
regarding study design and data collection procedures may be scarce.
•Data may potentially lack depth (the greater the breadth the harder it is
to measure any one construct in depth) (Koziol & Arthur, nd).
•Certain fields or departments (e.g., experimental programs) may place
less value on secondary data analysis (Koziol & Arthur, nd).
•Often requires special techniques to analyze statistically the data
METHODS OF COLLECTING
DATA
A. Direct
- interview method . this is a personal
communication with the individual you want to
interview
N 5000
n = ------------ = ----------------------
1 + N(e2) 1 + 5000(0.05)2
= 370 samples
Example
Suppose that N = 6500 and that the margin
of error is 4%, then the sample size is
N 6500
n = ------------ = ----------------------
1 + N(e2) 1 + 6500(0.04)2
= ? samples
Lynch et al formula:
Determining the sample size using Lynch et al formula:
𝑁𝑍 2 ∙ 𝑝(1 − 𝑝)
𝑛=
𝑁𝑑2 + 𝑍 2 𝑝(1 − 𝑝)
where
Z = the value of the normal variable (1.96)
for a reliability level of 0.95
p = the largest possible proportion (0.50)
d = sampling error
N = population
n = sample size
Example
Let N = 1000
The desired reliability level is 0.95
The allowed sampling error is 0.05
The proportion of a target population with a certain characteristic
important to the study is 0.50, then the sample size is computed:
1000 (1.96)2 x 0.50 (1 - 0.50)
n= 1000 (0.05)2 + (1.96)2 x 0.50 (1 - 0.50)
n = 277.54 0r 278
Pointers in sample size
determination
❖ When the population N is large, a small percentage
is recommended and when the population is small, a
larger percentage is necessary.
❖ The sample size should preferably be not smaller
than 30
2 15 6 5
3 20 6 8
Total N=50 n = 18 n = 18
Example:
Course No of Enrolles
MAED Math 40
MAED English 66
MAED Science 78
MAED ECD 34
MAED Reading 27
MAN 44
Total 289
Using Slovenes' Formula for sample n=167.78 or
168
Proportional
Course No of Enrolles Equal allocation Allocation
MAED Math 40 28 23
MAED English 66 28 38
MAED Science 78 28 45
MAED ECD 34 28 20
MAED Reading 27 28 16
MAN 44 28 26
Total 289 168 168
1. Purposive/judgmental sampling.
Here the researcher uses his good
judgment in selecting the respondents who best
meet the purposes of his study.
2. Quota sampling
is the non-probability equivalent of stratified
sampling with the added requirement that each substratum
is generally represented in the sample in the same
proportion as in the population.
If sex has something to do with the sharp difference
of the characteristics one wishes to measure, then quota
should be allotted for males and another for females, the
proportions of which should reflect their respective
proportions in the population
3. Convenience/accidental sampling.