Basic Statistical Techniques in Research: April 2018
Basic Statistical Techniques in Research: April 2018
net/publication/324606840
CITATIONS READS
0 24,017
2 authors, including:
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Understanding the Future of Health, Trade, Business and Sustainable Development View project
All content following this page was uploaded by Musibau Adetunji Babatunde on 19 April 2018.
2009
2 F.A. Adesoji & M.A. Babatunde
Basic Statistical Techniques in Research 1
CHAPTER 1
Introduction
The essence of research is to solve problem(s). In order to do this,
we have to select a plan for this study which is being referred to as
“design of the experiment”. In this case, research can be regarded
as the application of the scientific method to the study of a
problem. Question/hypotheses are thought of in an attempt to find
solution to the problem at hand. In order to gather information or
data, instruments must be in place. The data collected are not
useful because they are referred to as raw data or untreated data
until they are analyzed by using appropriate statistical tools.
The design of experiment is inseparable from the statistical
treatment of the results. If the design of an experiment is faulty, no
amount of statistical manipulation can lead to the drawing of valid
inference. Experimental design and statistical procedures are two
sides of the same coin. Any research that deals with the
manipulation of variables which are basically of two types. These
are, numerical and categorical. Numerical variables are recorded as
numbers such as height, age, scores, weight, etc. Categorical
variables could be dichotomy (for example, male or female),
trichotomy (for example, high, medium and low economic status)
or polychotomy (for example, birth places). Statistical techniques
have to do with data generation, manipulation and interpretation.
In order to generate data, measurement is necessary. Measurement
is the ascribing of symbols or figures to entities and it is thus basic
2 F.A. Adesoji & M.A. Babatunde
Qualitative
Quantitative
Data
discrete
continuous
The mean can not be determined in this case compare with the
scores of group of students in a course of study. Data can also
assume nominal level (assigning A, B, C), or ordinal (ordered or
ranked), interval (precise difference exist) or ratio.
For example, scores of 50% and 51%- a meaningful one
point difference exists, but there is no true zero point. For example,
0oF does not mean no heat at all, and the ratio- here, in addition to
the difference between units, there is a time zero point and true
ratio between values. Note that there is no complete agreement
among statisticians about the classification of data into one of the
four categories. Also, data can be altered so that they fit into a
different category. For example, if you categorize income of
workers into low, average and high, then a ratio variable becomes
an ordinal variable.
1
Standardized value means that a value is expressed in terms of its difference
from the mean, divided by the standard deviation.
2
The normal distribution, also called the Gaussian distribution, is an important
family of continuous probability distributions, applicable in many fields. Each
member of the family may be defined by two parameters, the mean ("average",
μ) and variance (standard deviation squared) σ2, respectively. The standard
normal distribution is the normal distribution with a mean of zero and a variance
of one.
6 F.A. Adesoji & M.A. Babatunde
3
The power of a statistical test is the probability that the test will reject a false
null hypothesis.
Basic Statistical Techniques in Research 7
The t-test
The t-test is the most commonly used method to evaluate the
differences in means between two groups. For example, the t-test
can be used to test for a difference in test scores between a group
of patients who were given a drug and a control group who
received an injection. Theoretically, the t-test can be used even if
the sample sizes are very small (e.g., as small as 10) as long as the
variables are normally distributed within each group and the
variation of scores in the two groups is not reliably different.
The p-level reported with a t-test represents the probability of
error involved in accepting the research hypothesis about the
existence of a difference. It is the probability of error associated
with rejecting the hypothesis of no difference between the two
categories of observations (corresponding to the groups) in the
population when, in fact, the hypothesis is true. If the calculated p-
value is below the threshold chosen for statistical significance
(usually the 0.05 level), then the null hypothesis which usually
states that the two groups do not differ is rejected in favor of an
alternative hypothesis, which typically states that the groups do
differ.
Basic Statistical Techniques in Research 11
the test may or may not read more. Alternatively, we might recruit
students with low scores and students with high scores in two
groups and assess their reading amounts independently.
An example of a repeated measures t-test would be if one
group were pre- and post-tested. For example, if a teacher wanted
to examine the effect of a new set of textbooks on student
achievement, he/she could test the class at the beginning of the
year (pre-test) and at the end of the year (post-test). A dependent t-
test would be used, treating the pre-test and post-test as matched
variables (matched by student).
ANOVA/ANCOVA /MCA
The purpose of analysis of variance (ANOVA) is to test
differences in means (for groups or variables) for statistical
significance. This is accomplished by analyzing the variance, that
is, by partitioning the total variance into the component that is due
to true random error and the components that are due to differences
between means. These latter variance components are then tested
for statistical significance, and, if significant, we reject the null
Basic Statistical Techniques in Research 13
Example
A research is interested in examining whether age of some set of
workers could affect their income and hours worked per week. The
two dependent variable being income and hours worked per
week.The three tables below show the Multivariate Analysis of
Variance.
16 F.A. Adesoji & M.A. Babatunde
Regression Analysis
Regression analysis is a statistical technique used for the modeling
and analysis of numerical data consisting of values of a dependent
variable (response variable) and of one or more independent
variables (explanatory variables). The dependent variable in the
regression equation is modeled as a function of the independent
variables, corresponding parameters (constants), and an error term.
The error term is treated as a random variable. It represents
unexplained variation in the dependent variable. The parameters
are estimated so as to give a best fit of the data. Most commonly
the best fit is evaluated by using the least squares method, but
other criteria have also been used. For some kinds of research
questions, regression can be used to examine how much a
particular set of predictors explains differences in some outcome.
In other cases, regression is used to examine the effect of some
specific factor while accounting for other factors that influence the
outcome.
Regression analysis requires assumptions to be made
regarding probability distribution of the errors. Statistical tests are
made on the basis of these assumptions. In regression analysis the
term model embraces both the function used to model the data and
the assumptions concerning probability distributions. Regression
can be used for prediction (including forecasting of time-series
data), inference, hypothesis testing, and modeling of causal
relationships. These uses of regression rely heavily on the
underlying assumptions being satisfied. Regression analysis has
been criticized as being misused for these purposes in many cases
where the appropriate assumptions cannot be verified to hold.
The set of underlying assumptions in regression analysis is that:
The sample must be representative of the population for the
inference prediction.
The dependent variable is subject to error. This error is
assumed to be a random variable, with a mean of zero.
The independent variable is error-free.
Basic Statistical Techniques in Research 19
(.005)(.000
)
Path Analysis
This is an extension of multiple regression analysis. The use of
path analysis enables the researcher to calculate the direct and
indirect influence of independent variables on a dependent
variable. These influences are reflected on the path coefficients,
which are actually standardized regression coefficients (Beta
weights). Path analysis is one of the techniques for the study and
analysis of causal relations in ex-post facto research. It usually
24 F.A. Adesoji & M.A. Babatunde
Factor Analysis
Factor analysis is useful in reducing a mass of information to a
manageable and economical description. For example, data on fifty
characteristics for 300 states are unwiedly to handle, descriptively
or analytically. Reducing them to their common factor patterns
facilitates the management, analysis and understanding of such
data. These factors concentrate and index characteristics without
much loss of information. States can be more easily discussed and
compared on economic, development, size and public dimensions
other than on the hundreds of characteristics each dimension
involves.
Basic Statistical Techniques in Research 25
A Hypothetical Situation
Suppose a 25 item questionnaire on students‟ attitude towards
Physics was administered to 40 students. Factor analysis could be
carried out to find out the commonalities of the test items such that
the 25 items would be reduced to a fewer number of items and the
instrument would still be able to measure validly and reliably the
construct attitudes towards Physics. Also, the scale items could be
sorted into their various components so that the items, which
correlate highly with themselves are group together.
Table 11 shows the initial eigen values which provide
information on the % variance explained by each of the variables.
It could be observed that out of the 25 items, the first 9 items
account for 76.75 of the total variance. The 25 items have been
reduced to 9 and the 9 items could be assumed to have measured
the construct, which the 25 items were designed to measure. This
shows that since the 9 items were found to account for 76.75 of the
total variance, if items 10-25 are explained, no serious harm would
be done to the scale of measurement.
The analysis was carried out to establish the number of
meaningful factors. Nine factors have thus been found to be
meaningful or nontrivial. These are the factors considered as
peculiar factors perceived by the students as their attitudes toward
Physics.
26 F.A. Adesoji & M.A. Babatunde
Names are usually given to the isolated factors and different ietms
are loaded on each factors.
Correlations
Correlation is a measure of the relation between two or more
variables. It indicates the strength and direction of a linear
relationship between two random variables. The correlation is 1 in
Basic Statistical Techniques in Research 27
can be used for variables measured at the ordinal level unlike the
Pearson product-moment correlation coefficient. However,
Spearman's correlation coefficient does assume that subsequent
ranks indicate equidistant positions on the variable measured.
Hypotheses
1. There is no significant main effect of treatment on
i. Learning outcome in Chemistry
ii. Attitude to Chemistry
2. There is no significant main effect of ability on:
i. Learning outcome in Chemistry
ii. Attitude to Chemistry
3. There is no significant main effect of gender on
i. Learning outcome in Chemistry
ii. Attitude to Chemistry
4. There is no significant interaction effect of treatment and
gender on
i. Learning outcome in Chemistry
ii. Attitude to Chemistry
Interpretation
The topic shows effects of two methods of instruction
(independent) variables on two dependent variables (learning
outcome in Chemistry and attitude to Chemistry). There should be
a control group. The design then becomes: Pretest, Posttest, control
group experimental design. Therefore, the appropriate statistical
tool is Analysis of Covariance (ANCOVA) with pretest scores as
covarite. Two categorical varaibles are being investigated. These are
gender (dichotomous) and academic ability level (trichotomous). And
because of the interaction hypotheses, there must be factorial design.
Here it is 3 x 2 x`3, which is interpreted as follows: treatment at 3
levels, gender at 2 levels nd academic ability at 3 levels.
Interpretation
Other hypotheses could be based on other vaiables investigated in
the study. Here, the nature of the data generated cannot be purely
numerical. Therefore, non-parametric statistics is the best bet. The
candiadte could use Chi-square statistic for analyzing te data
collected.
Interpretation
This is a descriptive survey research and based on the languages of
the topic and hypotheses, it is a relational study. The hypotheses
could be tested by Pearson Product Moment Correlation or Chi-
square statistic. If ranking is involved the candidate could still
make use of Spearman rank order correlation.
References
Adesoji, F.A. (2006). Statistical Methods for Data Analysis and
Data Interpretation In Alegbeleye, G.O, Mabawonku, I and
Fabunmi, M (eds.) Research Methods in Education,
University of Ibadan, Ibadan.
Bluman, A.G. (1990) Elementary Statistics. McGraw Hill, Higher
Education, New York.
Gbadegesin, A., R. Olopoenia, and A. Jerome (2005) “Statistics for
the Social Sciences”. Ibadan University Press.
Gujarati, D. N (1995) Basic Econometrics. 3rd Edition. McGraw
Hill, New York.