Chap 4 Research Method and Technical Writing
Chap 4 Research Method and Technical Writing
Statistics in Research
&
Processing and Data Analysis
Introduction
Types of Statistics
Descriptive
Inferential
1
Introduction
The data collected is nothing more than a
group of numbers till analyzed
Statistical analysis converts numbers into
meaningful conclusions
What is statistics? The collecting,
summarizing, and analyzing of data
2
Descriptive & Inferential Statistics
Descriptive Statistics Inferential Statistics
4
Three different types of descriptive statistics:
measures of frequency, (with bar chart, line
chart, Histogram, Pie –Chart)
measures of central tendency ( mean, median,
and mode), and
measures of variability, spread or
6
Organizing data
Tables
Frequency Distributions
Relative Frequency Distributions
Graphs
Bar Chart or Histogram
Line Charts
7
To visualize trends in the data, it is generally useful to plot
the data even before carrying out statistical analysis
8
Summarizing Data:
Central Tendency (or Groups’ “Middle Values”)
Mean
Median
Mode
Variance
Standard Deviation
9
Measures of Central Tendency
Mean
The sum of all the scores divided by the number of scores.
Often referred to as the average.
Good measure of central tendency.
Central tendency is simply the location of the middle in a
distribution of scores.
The mean can be misleading because it can be greatly influenced
by extreme scores (very high, or very low scores). especially if the
number of participants is small. n
x i
i 1
X
n 10
Mode
The most common data point is called the mode.
It may not be at the center of a distribution.
It may give you the most likely experience rather than the
“typical” or “central” experience.
In symmetric distributions, the mean, median, and mode
are the same.
In skewed data, the mean and median lie further toward
the skew than the mode.
11
Mode…
Although this measure is convenient in that it requires no
calculations, it is easily affected by chance scores,
especially if the study has a small number of participants.
For this reason, the mode does not always give an
accurate picture of the typical behavior of the group and is
not commonly employed in many research.
12
Median
The middle value when a variable’s values are ranked in order;
the point that divides a distribution into two equal halves.
When data are listed in order, the median is the point at which
50% of the cases are above and 50% below it.
The median is unaffected by outliers, making it a better measure
of central tendency, better describing the “typical person” than the
mean when data are skewed.
If the recorded values for a variable form a symmetric distribution,
the median and mean are identical.
In skewed data, the mean lies further toward the skew than the
median.
commonly used with a small number of scores or when the data
contain extreme scores, known as outliers. 13
14
Measures of Dispersion
Measures of central tendency are useful to give the
typical behavior of the group.
But the use of measures of central tendency alone may
also obscure some important information.
There can be cases where two groups having same
mean but, one group's scores are all close to the mean;
the other group's scores are more widely dispersed.
We need this additional information on the dispersion, or
variability, of scores & using range , variance and
standard deviation
15
Range
The spread, or the distance, between the lowest and
highest values of a variable.
To get the range for a variable, you subtract its lowest
value from its highest value.
The range, although easy to calculate, is not commonly
reported in most studies because it is sensitive to
extreme scores and thus is not always a reliable index
of variability.
16
Variance
17
Standard Deviation (SD)
A summary statistic of how much scores vary
from the mean
Square root of the Variance
expressed in the original units of measurement
Represents the average amount of dispersion in
a sample
18
Standard Deviation (SD)…
It is a more common way of measuring variability
It is a number that shows how scores are spread around
the mean.
The larger the standard deviation, the more variability
there is in a particular group of scores. Conversely, a
smaller standard deviation indicates that the group is
more homogeneous in terms of a particular behavior.
the smaller the standard deviation, the better the mean
captures the behavior of the sample.
19
2
n
( xi )
i 1 n
2
, Variance
2
n
( xi ) Standard
i 1 n
deviation
20
Outliers
data that seem to be atypical of the rest of the dataset.
The presence of outliers strongly suggests that the researcher
needs to take a careful look at the data and determine whether
the data collected from specific individuals are representative of
the data elicited from the group as a whole.
There are times when researchers may decide not to include
outlier data in the final analysis, but if this is the case there needs
to be a principled reason for not including them beyond the fact
that they "don't fit right."
Should researchers decide that there are principled reasons for
eliminating outlying data, a detailed explanation in the research
report needs to be provided.
21
Summary of Descriptive statistics
Frequencies, as well as measures of central tendency,
are often presented in various studies even when they do
not relate directly to the research questions.
because frequency measures provide a succinct
summary of the basic characteristics of the data
allowing readers to understand the nature of the data with
minimum space expenditure.
Providing visual representations of results in graphical
form can also contribute to a clearer understanding of any
patterns confirmed through statistical testing and can
provide an early picture of any outliers in the data. 22
Choosing the right test
Types of Inferential Statistics
23
Relationships Between Variables
Linear Regression, Pearson's r:
The general equation for the line is Y = mX + b. The
equation used in linear regression is written like this:
Y = a + bX
25
t-tests
The t-test can be used when one wants to determine if
the means of two groups are significantly different from
one another.
There are two types of t-tests—one is used when the
groups are independent and the other, known as a paired
t-test, is used when the groups are not independent, as
in a pretest/posttest situation when the focus is within a
group
26
Analysis of variance (ANOVA).
27
28
Analysis of Covariance (ANCOVA).
There are times when there might be a preexisting
difference among groups and the variable where
that difference is manifested is related to the
dependent variable.
In other words, differences in means on variable X
will show up on a pretest.
The preexisting difference will need to be controlled
for and is referred to as the covariate.
29
Multivariate Analysis of Variance (MANOVA)
The MANOVA is part of the family of analyses of
variance.
It differs from an ANOVA in that it has more than one
dependent variable.
In order to appropriately use a multivariate analysis of
variance, there has to be justification for believing that the
dependent variables are related to one another.
30
Important definitions (Normal Dist. and Standard Score)
NORMAL DISTRIBUTION
A distribution describes the clustering of
scores/behaviors.
In a normal distribution (also known as a bell curve) the
numbers (e.g., scores on a particular test) cluster
around the midpoint.
There is an even and decreasing distribution of scores in
both directions.
Another characteristic of a normal distribution relates to
the standard deviation.
31
NORMAL DISTRIBUTION
In a normal distribution, approximately 34% of the data lie within 1 standard
deviation of the mean. In other words, 34% of the data are one standard
deviation above the mean and 34% are one standard deviation below the
mean (Total of 68% data on both direction).
Two standard deviations above and below the mean, we capture an
additional 27% for a total of 95%.
Finally, approximately 2.13% of the data fall between 2 and 3 standard
deviations, leaving only approximately .3% of the data beyond 3 standard
deviations above and below the mean (99.7%).
If we know that a group of scores is normally distributed and if we know
the mean and the standard deviation, we can then determine where
individuals fall within a group of scores.
32
STANDARD SCORES
There are times when we want to compare an individual's
performance on different set-up) (22/75 and 22/50).
One way to make a more meaningful comparison is to
convert these raw scores into standard scores.
One of the most common standard scores are z scores
uses standard deviations to reflect the distance of a score
from a mean.
If a score is one standard deviation above the mean, it has a
z score of +1, a score that is two standard deviations above
the mean has a z score of +2, and a score that is one
standard deviation below the mean has a z score of -1.
33