Assignment 11-23-24
Assignment 11-23-24
STA. 603
SATURDAY 7:00 – 10:00 AM
ASSIGNMENT DATED NOVEMBER 23, 2024
CORRELATION COEFFICIENT
1.PEARSON
In statistics, the Pearson correlation coefficient, also referred to as Pearson's r, the Pearson product-
moment correlation coefficient (PPMCC) or the bivariate correlation, is a measure of the
linear correlation between two variables X and Y. According to the Cauchy–Schwarz inequality it has a
value between +1 and −1, where 1 is total positive linear correlation, 0 is no linear correlation, and −1
is total negative linear correlation. It is widely used in the sciences It was developed by Karl Pearson
from a related idea introduced by Francis Galton in the 1880s and for which the mathematical formula
was derived and published by Auguste Bravais in 1844. The naming of the coefficient is thus an
example of Stigler's Law.
DEFINITION
Pearson's correlation coefficient is the covariance of the two variables divided by the product of
their standard deviations. The form of the definition involves a "product moment", that is, the mean
(the first moment about the origin) of the product of the mean-adjusted random variables; hence the
modifier product-moment in the name.
Critical values of Pearson’s correlation coefficient that must be exceeded to be considered significantly
non zero at the 0.05 level.
2. INTRACLASS CORRELATION
In statistics, the intraclass correlation, or the intraclass correlation coefficient (ICC), is a descriptive
statistic that can be used when quantitative measurements are made on units that are organized into
groups. It describes how strongly units in the same group resemble each other. While it is viewed as
a type of correlation, unlike most other correlation measures it operates on data structured as groups,
rather than data structured as paired observations.
The intraclass correlation is commonly used to quantify the degree to which individuals with a fixed
degree of relatedness (e.g. full siblings) resemble each other in terms of a quantitative trait
(see heritability). Another prominent application is the assessment of consistency or reproducibility
of quantitative measurements made by different observers measuring the same quantity.
Relationship to Pearson’s correlation coefficient
In terms of its algebraic form, Fisher's original ICC is the ICC that most resembles the Pearson
correlation coefficient. One key difference between the two statistics is that in the ICC, the data are
centered and scaled using a pooled mean and standard deviation, whereas in the Pearson correlation,
each variable is centered and scaled by its own mean and standard deviation. This pooled scaling for
the ICC makes sense because all measurements are of the same quantity (albeit on units in different
groups). For example, in a paired data set where each "pair" is a single measurement made for each
of two units (e.g., weighing each twin in a pair of identical twins) rather than two different
measurements for a single unit (e.g., measuring height and weight for each individual), the ICC is a
more natural measure of association than Pearson's correlation.
An important property of the Pearson correlation is that it is invariant to application of separate linear
transformations to the two variables being compared. Thus, if we are correlating X and Y, where,
say, Y = 2X + 1, the Pearson correlation between X and Y is 1 — a perfect correlation. This property
does not make sense for the ICC, since there is no basis for deciding which transformation is applied
to each value in a group. However, if all the data in all groups are subjected to the same linear
transformation, the ICC does not change.
3.RANK CORRELATION
In statistics, a rank correlation is any of several statistics that measure an ordinal association—the
relationship between rankings of different ordinal variables or different rankings of the same variable,
where a "ranking" is the assignment of the ordering labels "first", "second", "third", etc. to different
observations of a particular variable. A rank correlation coefficient measures the degree of similarity
between two rankings, and can be used to assess the significance of the relation between them. For
example, two common nonparametric methods of significance that use rank correlation are
the Mann–Whitney U test and the Wilcoxon signed-rank test.
Context
If, for example, one variable is the identity of a college basketball program and another variable is the
identity of a college football program, one could test for a relationship between the poll rankings of
the two types of programs: do colleges with a higher-ranked basketball program tend to have a higher-
ranked football program? A rank correlation coefficient can measure that relationship, and the
measure of significance of the rank correlation coefficient can show whether the measured
relationship is small enough to likely be a coincidence.
If there is only one variable, the identity of a college football program, but it is subject to two different
poll rankings (say, one by coaches and one by sportswriters), then the similarity of the two different
polls' rankings can be measured with a rank correlation coefficient.
As another example, in a contingency table with low income, medium income, and high income in the
row variable and educational level—no high school, high school, university—in the column
variable),[1] a rank correlation measures the relationship between income and educational level.
CORRELATION COEFFICIENT
Some of the more popular rank correlation statistics include
1. Spearman's ρ
2. Kendall's τ
3. Goodman and Kruskal's γ
4. Somers' D
An increasing rank correlation coefficient implies increasing agreement between rankings. The
coefficient is inside the interval [−1, 1] and assumes the value:
• 1 if the agreement between the two rankings is perfect; the two rankings are the same.
• 0 if the rankings are completely independent.
• −1 if the disagreement between the two rankings is perfect; one ranking is the reverse of the
other.
Following Diaconis (1988), a ranking can be seen as a permutation of a set of objects. Thus, we can
look at observed rankings as data obtained when the sample space is (identified with) a symmetric
group. We can then introduce a metric, making the symmetric group into a metric space. Different
metrics will correspond to different rank correlations.
4.TETRACHORIC CORRELATION
Tetrachoric correlation is used to measure rater agreement for binary data; Binary data is data with
two possible answers—usually right or wrong. The tetrachoric correlation estimates what the
correlation would be if measured on a continuous scale. It is used for a variety of reasons including
analysis of scores in Item Response Theory (IRT) and converting comorbity statistics to correlation
coefficients. This type of correlation has the advantage that it’s not affected by the number of rating
levels, or the marginal proportions for rating levels.
The term “tetrachoric correlation” comes from the tetrachoric series, a numerical method used before
the advent of computers. While it’s more common to estimate correlations with methods like
maximum likelihood estimation, there is a basic formula you can use
5.POLYCHORIC CORRELATION
In statistics, polychoric correlation is a technique for estimating the correlation between two
theorised normally distributed continuous latent variables, from two observed ordinal
variables. Tetrachoric correlation is a special case of the polychoric correlation applicable when both
observed variables are dichotomous. These names derive from the polychoric and tetrachoric series
which are used for estimation of these correlations. These series were mathematical expansions once
but not anymore.
This technique is frequently applied when analysing items on self-report instruments such
as personality tests and surveys that often use rating scales with a small number of response options
(e.g., strongly disagree to strongly agree). The smaller the number of response categories, the more a
correlation between latent continuous variables will tend to be attenuated. Lee, Poon & Bentler (1995)
have recommended a two-step approach to factor analysis for assessing the factor structure of tests
involving ordinally measured items. This aims to reduce the effect of statistical artifacts, such as the
number of response scales or skewness of variables leading to items grouping together in factors.
https://www.scribd.com/document/441387546/CORRELATION-COEFFICIENT