MAPC006
MAPC006
particularly when dealing with observations subject to variability, such as human behavior and
characteristics like attitude, intelligence, and personality, which differ among individuals. This
field encompasses techniques like classification, tabulation, graphical presentation, and
measures of central tendency and variability. Through these methods, researchers can effectively
understand and convey the characteristics and tendencies within a dataset. Parameters derived
from these analyses provide precise definitions of population groups and serve as
comprehensive descriptors of the data distribution. Descriptive statistics are indispensable for
making sense of diverse observations.
Organizing and presenting data effectively is a crucial aspect of statistical analysis. In the
realm of statistics, four major techniques are employed for this purpose: classification,
tabulation, graphical presentation, and diagrammatical presentation.
Classification is the first step in organizing data. It involves grouping data points based
on their similarities. Essentially, classification helps in summarizing the frequency of
individual scores or score ranges for a variable. This process provides a structured
overview of data, aiding in decision-making. The outcome of classification is often a
frequency distribution, which shows how frequently each score or range of scores
occurs.
Frequency distributions can be further divided into two types: ungrouped data and
grouped data. In ungrouped data, all individual score values are listed, typically in
ascending or descending order, and the frequency of each score is tallied. In grouped
data, when there's a wide range of score values, they are organized into classes or
intervals to provide a clearer overview of the data.
1. Range of Data: This represents the difference between the highest and lowest
scores.
2. Number of Class Intervals: Deciding the appropriate number of classes depends
on the dataset's size and complexity. Generally, the number of classes falls
between 5 to 30.
3. Limits of Each Class Interval: These limits define the width or range of each
class, denoted as 'i'. Class intervals should have uniform widths, which are
usually whole numbers divisible by common factors like 2, 3, 5, 10, or 20.
Three methods are commonly used to describe class limits: exclusive method, where
the upper limit of one class becomes the lower limit of the next class; inclusive method,
which does not overlap class limits, allowing scores equal to the upper limit to belong to
that class; and true or actual class method, which considers a score as part of the class
if it falls within 0.5 units below to 0.5 units above the face value of the score.
1. Table Number: If multiple tables are used, each should be numbered for
reference.
2. Title of the Table: A clear, concise title that describes the table's content.
3. Caption: Captions serve as headings for columns, including both main headings
and sub-headings.
4. Stub: Stubs are concise headings for rows, providing clarity and context.
5. Body of the Table: This section contains the actual numerical data, organized
according to the captions and stubs.
6. Head Note: Information about the units of measurement used in the table.
7. Footnote: Additional explanations or details related to the data.
8. Source of Data: Information about the data's origin or source.
Bar Diagram: Suitable for categorical data, bar diagrams use bars to represent
frequencies or values of variables, with the height of each bar corresponding to
the frequency.
Sub-divided Bar Diagram: This diagram allows for the visualization of sub-
classifications within a category by dividing and shading bars accordingly.
Multiple Bar Diagram: Used for comparing two or more sets of interrelated
variables, it presents multiple sets of bars side by side, often using different
colors or shades to distinguish between them.
Pie Diagram: Also known as a pie chart, this circular diagram divides a circle into
sectors, each representing a variable's percentage contribution to the total.
2
The Bell-shaped curve technically known as the Normal Probability Curve or simply the Normal Curve is a
fundamental concept in the field of probability and data analysis. The corresponding frequency
distribution of scores, which exhibits the same values for all three measures of central tendency (Mean,
Median, and Mode), is known as a Normal Distribution. Understanding the Normal Probability Curve and
the Normal Distribution is essential for various scientific disciplines, including the physical, biological,
and behavioral sciences.
The foundation of the Normal Probability Curve is based on the law of probability. This law, also known
as the law of chance or the probable occurrence of certain events, was initially formulated by the French
mathematician Abraham de Moivre in the 18th century.
It is particularly valuable in the analysis and interpretation of data and observations. The Normal
Probability Curve is characterized by several key features:
**1. Symmetry:** The Normal Curve is perfectly symmetrical around its vertical axis, referred to as the
ordinate. This symmetry means that the left and right halves of the curve, with the vertical axis at the
center, are mirror images of each other. This creates a bell-shaped appearance, which is a hallmark of
the normal distribution.
**2. Unimodality:** The Normal Curve is unimodal, meaning it has only one mode, or peak. This mode
represents the most frequently occurring value in the distribution, which is also equal to the mean,
median, and mode of the distribution. This concentration of values around the mean contributes to the
characteristic bell shape.
**3. Maximum Ordinate:** The highest point or maximum ordinate of the Normal Curve always occurs
at its center, which corresponds to the mean of the distribution. In the case of the standard normal
curve, the maximum ordinate is precisely 0.3989.
**4. Asymptotic to the X-Axis:** The Normal Probability Curve approaches the horizontal axis
asymptotically, meaning it never actually touches the X-axis but continually decreases in height as it
extends away from the central point. This extension reaches from minus infinity (-∞) to plus infinity
(+∞), signifying that the curve encompasses the entire real number line.
**5. Symmetric Decline:** As you move away from the central point in both directions, the height of the
Normal Curve declines symmetrically. This gradual decline on either side contributes to the bell-shaped
distribution.
**6. Points of Influx:** The Normal Curve exhibits points of influx where it changes from a convex to a
concave shape. These points of change are significant and are characterized by the transition between
positive and negative values of the distribution. Specifically, if you draw perpendicular lines from these
points to the X-axis, they will intersect at a distance of one standard deviation unit above and below the
mean. This feature provides insights into the spread of data within the distribution.
**7. Percentage of Area within Two Points of Influx:** Approximately 68.26% of the total area under the
Normal Curve falls within the limits of ±1 standard deviation (±1σ) unit from the mean. This percentage
represents a crucial aspect of the distribution and is commonly used in statistical analysis.
**8. Total Area as 100 Percent Probability:** The total area under the Normal Curve may be considered
as approaching 100 percent probability. This interpretation is based on the standard deviations and
represents the probability associated with different intervals or regions within the distribution. The area
between successive standard deviations corresponds to specific percentages of the total area, providing
a way to calculate probabilities for different ranges of values.
**9. Bilaterality:** The Normal Curve is bilateral, meaning that 50% of the area falls to the left of the
maximum central ordinate, while the other 50% lies to the right. This balance in distribution is another
fundamental characteristic of the normal distribution and is crucial for understanding probabilities
associated with different values.
**10. Mathematical Model in Behavioral Sciences:** The Normal Probability Curve serves as a
mathematical model in the behavioral sciences, particularly in mental measurement. It is used as a
reference curve for measurement in areas such as intelligence, achievement, adjustment, anxiety, and
socio-economic status. In this context, the unit of measurement is described as ±1σ (Sigma), signifying
one standard deviation from the mean.
In summary, the Normal Probability Curve, often referred to as the Normal Curve, is a cornerstone of
statistical analysis and probability theory. Its symmetrical, unimodal, and asymptotic characteristics make
it a powerful tool for understanding and analyzing data. The points of influx, percentage areas, and
mathematical modeling aspects contribute to its significance in various scientific disciplines, particularly
in mental measurement and the behavioral sciences. Understanding the Normal Curve is essential for
researchers, statisticians, and analysts when working with data that follows a normal distribution.
4
**Assumptions of Parametric Statistics:**
Parametric statistical tests, such as t-tests and analysis of variance (ANOVA), are powerful tools for data
analysis, but they rely on certain assumptions for their validity. To appropriately apply parametric tests:
1. **Normal Distribution: The population from which the sample is drawn should follow a normal
distribution. Normal distributions have the characteristic bell-shaped curve that extends infinitely in both
directions. This assumption is essential for many parametric tests.
2. **Measurement Scale:** The variables being analyzed must be measured on an interval or ratio scale.
These scales provide quantitative data that maintain the necessary properties for parametric tests.
3. **Independence of Observations:** The observations within the sample should be independent. This
means that selecting or excluding one case from the sample should not unduly affect the results of the
study.
4. **Homoscedasticity:** The populations being compared should have equal variances, or in some
cases, a known ratio of variances. Homoscedasticity ensures that the variability within groups is
consistent.
5. **Equality of Variances in Samples:** When working with small samples, it is crucial that the samples
have equal or nearly equal variances. This condition, known as the homogeneity of variances, is
particularly important for small sample sizes.
Non-parametric statistics are valuable when the assumptions for parametric tests are not met. These
tests are suitable for situations where:
1. **Small Sample Size:** Non-parametric tests are preferable when dealing with very small sample
sizes, often as small as N=5 or N=6. In such cases, parametric tests may not be applicable.
2. **Non-Normal Distribution:** If the distribution of the sample is not normally distributed, non-
parametric tests are more appropriate. These tests do not rely on the assumption of normality.
3. **Nominal or Ordinal Data:** Non-parametric tests are suitable when working with nominal or
ordinal scale data. They are also used when data can be expressed as ranks or in categories like "good"
or "bad."
4. **Population Nature is Unknown:** When the nature of the population from which samples are
drawn is not known to be normal, non-parametric tests can be applied to avoid making unwarranted
assumptions.
5. **Nominal Variables:** When dealing with data expressed solely in nominal form, non-parametric
statistics are the preferred choice.
6. **Ranked or Categorized Data:** Non-parametric tests are used when data are in the form of ranks or
categories (e.g., "+," "-"). These tests can handle the strength of rankings in the data.
6
**Level of Measurement:**
When selecting a statistical test, it's vital to determine the level of measurement for the dependent
variable. Parametric tests typically require at least interval-level measurement, while non-parametric
methods are applicable to all measurement levels, making them suitable for nominal and ordinal data.
**Nominal Data:**Nominal data, the lowest measurement level, is categorical, consisting of two
mutually exclusive named categories with no inherent ordering. Examples include "yes" or "no" and
"male" or "female." There are no numerical values assigned to nominal variables.
**Ordinal Data:** Ordinal data, the second measurement level, provides a quantitative order for
variables but doesn't indicate the value of the differences between positions. While you can rank
variables, the differences between ranks lack meaning. Health science research often uses ordinal scales
like pain scales, stress scales, and functional scales.
**Interval and Ratio Data:** Parametric tests require at least interval-level data, characterized by
equidistant divisions between categories. In interval data, zero doesn't signify the absence of value,
making it impossible to conclude that one point is twice as large as another. Ratio data, the highest
measurement level, has equal intervals and a meaningful zero point. Common examples include weight,
blood pressure, and force.
**Sample Size:**Parametric tests assume large sample sizes, but this condition isn't met in many
research studies. Non-parametric techniques are ideal for smaller sample sizes and can provide
adequate statistical power. The definition of a "small" sample size can vary, and the choice between
parametric and non-parametric tests depends on several factors, including sample size and the nature of
the data.
**Normality of the Data:**Parametric tests require the assumption of a normal distribution, but in
practice, it's often challenging to ascertain the distribution of the dependent variable. Non-parametric
statistics do not rely on the assumption of normality and are designed for situations where the
distribution of the variable is unknown. Researchers frequently make the naïve assumption of normality,
even when there is evidence suggesting that non-normal distributions exist in various fields.
The Kruskal-Wallis test is a nonparametric (distribution free) test, and is used when the
assumptions of one-way ANOVA are not met. Both the Kruskal-Wallis test and one-way
ANOVA assess for significant differences on a continuous dependent variable by a categorical
independent variable (with two or more groups). In the ANOVA, we assume that the dependent
variable is normally distributed and there is approximately equal variance on the scores across
groups. However, when using the Kruskal-Wallis Test, we do not have to make any of these
assumptions. Therefore, the Kruskal-Wallis test can be used for both continuous and ordinal-
level dependent variables. However, like most non-parametric tests, the Kruskal-Wallis Test is
not as powerful as the ANOVA.
Null hypothesis: Null hypothesis assumes that the samples (groups) are from identical
populations.
Alternative hypothesis: Alternative hypothesis assumes that at least one of the samples
(groups) comes from a different population than the others.
The distribution of the Kruskal-Wallis test statistic approximates a chi-square distribution, with
k-1 degrees of freedom, if the number of observations in each group is 5 or more. If the
calculated value of the Kruskal-Wallis test is less than the critical chi-square value, then the null
hypothesis cannot be rejected. If the calculated value of Kruskal-Wallis test is greater than the
critical chi-square value, then we can reject the null hypothesis and say that at least one of the
samples comes from a different population.
Assumptions
1. We assume that the samples drawn from the population are random.
2. We also assume that the observations are independent of each other.
3. The measurement scale for the dependent variable should be at least ordinal.
The Kruskal Wallis test is the non parametric alternative to the One
Way ANOVA. Non parametric means that the test doesn’t assume your data
comes from a particular distribution. The H test is used when the assumptions
for ANOVA aren’t met (like the assumption of normality). It is sometimes called
the one-way ANOVA on ranks, as the ranks of the data values are used in the
test rather than the actual data points.
The test determines whether the medians of two or more groups are different.
Like most statistical tests, you calculate a test statistic and compare it to a
distribution cut-off point. The test statistic used in this test is called the H
statistic. The hypotheses for the test are:
Skewness and kurtosis are vital characteristics of frequency distributions. Skewness measures the
asymmetry of a distribution, indicating whether scores are concentrated at one end. It often correlates
with variability, where greater skewness implies more variability. Kurtosis, on the other hand, describes
the peakedness or flatness of a distribution compared to a normal curve. Leptokurtic distributions are
more peaked, indicating a thinner distribution, while platykurtic ones are flatter. A normal curve is
mesokurtic.
In statistics, there are two main approaches to estimating population parameters from sample data.
Point estimation involves using a single value, such as the sample mean (x), as an estimate of the
population parameter (μ). For example, if the sample mean is 45.0, it would be considered the point
estimate of the population mean (μ).
Interval estimation aims to provide a range, called a confidence interval, within which the population
parameter is likely to fall. This interval is determined by lower and upper confidence limits, and a
specified level of confidence, such as 95% or 99%, is associated with the estimate.
The null hypothesis, H0, represents a theory that has been put forward, either because it is
believed to be true or because it is to be used as a basis for argument, but has not been proved.
Researchers use hypothesis testing to assess whether there is sufficient evidence to reject the
null hypothesis in favor of an alternative hypothesis, which asserts that there is a significant
relationship or effect in the data. The null hypothesis serves as a starting point for scientific
inquiry and statistical analysis in psychological research.
A scatter diagram is a graphical representation used to explore the relationship between two variables.
To create one, follow these steps: First, draw the x and y axes, placing one variable on each axis.
Determine the range of values, typically starting from zero, and make the graph square. Identify pairs of
values from your data, each consisting of one value from each variable. Finally, locate these pairs on the
graph and mark their intersection points with clear dots or symbols.
Outliers are extreme score on one of the variables or both the variables. The presence
of outliers has deterring impact on the correlation value. The strength and degree of
the correlation are affected by the presence of outlier. Suppose you want to compute
correlation between height and weight. They are known to correlate positively. Look
at the figure below. One of the scores has low score on weight and high score on
height (probably, some anorexia patient).
The biserial correlation coefficient (rb) measures the correlation between two continuous variables, one
of which is measured dichotomously. This is useful when one variable, like mood, is theoretically
continuous but measured as a binary variable. The formula for rb resembles that of the point-biserial
correlation, and you can derive rb from rpb (point-biserial correlation). It calculates the relationship
between these variables based on the means, proportions, and standard deviations of the data pairs for
the dichotomized and continuous variables.
Variance in statistics represents the degree of dispersion in a dataset, signifying the distances of
individual scores from the mean. It's computed as the sum of squared deviations from the mean, divided
by the degrees of freedom (N-1). Variance is an essential measure of group variability and is distinct from
standard deviation (S.D.), even though their formulae are identical. Characteristics of variance include its
role in quantifying variability within and between groups, its always positive value, and its nature as an
area measure rather than a directional one. Adding or subtracting a constant to a dataset does not
change its variance.
In the context of two-way analysis of variance, the term "interaction" or "interactional effect" is crucial. It
pertains to the combined impact of two or more independent variables on a dependent variable.
Understanding these interactions is essential for assessing how different variables jointly affect the
dependent variable. For instance, in agriculture, combining two fertilizers might dramatically affect crop
growth, either positively or negatively. Similarly, in psychology and education, combining two teaching
methods may yield more significant results in academic achievement or none at all.
The Wilcoxon Matched Pair signed-ranks test is a non-parametric method used for paired data with at
least ordinal measurement. Unlike the sign test, it considers the magnitude of differences between
paired observations. It involves obtaining differences within each pair, ranking these differences,
summing the ranks of negative differences and positive differences separately, and then calculating the
test statistic, T, as the smaller of these two sums. This test determines if the null hypothesis of no
difference between paired observations holds.