0% found this document useful (0 votes)
162 views

Nonparametric Statistics

Nonparametric statistics make fewer assumptions than parametric statistics about the underlying probability distributions of data. They do not assume a particular distribution family or specify distribution parameters. Nonparametric methods include distribution-free techniques as well as methods where the model structure is determined from data rather than specified beforehand. Common nonparametric tests include rank-based tests like the Mann-Whitney U test and Kolmogorov-Smirnov test. Nonparametric methods have wider applicability since fewer assumptions are made, but they typically require larger sample sizes to draw the same conclusions as parametric methods.

Uploaded by

joseph676
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
162 views

Nonparametric Statistics

Nonparametric statistics make fewer assumptions than parametric statistics about the underlying probability distributions of data. They do not assume a particular distribution family or specify distribution parameters. Nonparametric methods include distribution-free techniques as well as methods where the model structure is determined from data rather than specified beforehand. Common nonparametric tests include rank-based tests like the Mann-Whitney U test and Kolmogorov-Smirnov test. Nonparametric methods have wider applicability since fewer assumptions are made, but they typically require larger sample sizes to draw the same conclusions as parametric methods.

Uploaded by

joseph676
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Nonparametric statistics

Nonparametric statistics is the branch of statistics that is not based solely on parametrized families of
probability distributions (common examples of parameters are the mean and variance). Nonparametric
statistics is based on either being distribution-free or having a specified distribution but with the
distribution's parameters unspecified. Nonparametric statistics includes both descriptive statistics and
statistical inference. Nonparametric tests are often used when the assumptions of parametric tests are
violated.[1]

Definitions
The term "nonparametric statistics" has been imprecisely defined in the following two ways, among others:

1. The first meaning of nonparametric covers techniques that do not rely on data belonging to
any particular parametric family of probability distributions.

These include, among others:

distribution-free methods, which do not rely on assumptions that the data are drawn from
a given parametric family of probability distributions. As such it is the opposite of
parametric statistics.
nonparametric statistics (a statistic is defined to be a function on a sample; no
dependency on a parameter).

Order statistics, which are based on the ranks of observations, is one example of such
statistics.

The following discussion is taken from Kendall's Advanced Theory of Statistics.[2]

Statistical hypotheses concern the behavior of observable random variables....


For example, the hypothesis (a) that a normal distribution has a specified mean
and variance is statistical; so is the hypothesis (b) that it has a given mean but
unspecified variance; so is the hypothesis (c) that a distribution is of normal form
with both mean and variance unspecified; finally, so is the hypothesis (d) that two
unspecified continuous distributions are identical.

It will have been noticed that in the examples (a) and (b) the distribution
underlying the observations was taken to be of a certain form (the normal) and
the hypothesis was concerned entirely with the value of one or both of its
parameters. Such a hypothesis, for obvious reasons, is called parametric.

Hypothesis (c) was of a different nature, as no parameter values are specified in


the statement of the hypothesis; we might reasonably call such a hypothesis non-
parametric. Hypothesis (d) is also non-parametric but, in addition, it does not
even specify the underlying form of the distribution and may now be reasonably
termed distribution-free. Notwithstanding these distinctions, the statistical
literature now commonly applies the label "non-parametric" to test procedures
that we have just termed "distribution-free", thereby losing a useful classification.

2. The second meaning of non-parametric covers techniques that do not assume that the
structure of a model is fixed. Typically, the model grows in size to accommodate the
complexity of the data. In these techniques, individual variables are typically assumed to
belong to parametric distributions, and assumptions about the types of connections among
variables are also made. These techniques include, among others:
non-parametric regression, which is modeling whereby the structure of the relationship
between variables is treated non-parametrically, but where nevertheless there may be
parametric assumptions about the distribution of model residuals.
non-parametric hierarchical Bayesian models, such as models based on the Dirichlet
process, which allow the number of latent variables to grow as necessary to fit the data,
but where individual variables still follow parametric distributions and even the process
controlling the rate of growth of latent variables follows a parametric distribution.

Applications and purpose


Non-parametric methods are widely used for studying populations that take on a ranked order (such as
movie reviews receiving one to four stars). The use of non-parametric methods may be necessary when
data have a ranking but no clear numerical interpretation, such as when assessing preferences. In terms of
levels of measurement, non-parametric methods result in ordinal data.

As non-parametric methods make fewer assumptions, their applicability is much wider than the
corresponding parametric methods. In particular, they may be applied in situations where less is known
about the application in question. Also, due to the reliance on fewer assumptions, non-parametric methods
are more robust.

Another justification for the use of non-parametric methods is simplicity. In certain cases, even when the
use of parametric methods is justified, non-parametric methods may be easier to use. Due both to this
simplicity and to their greater robustness, non-parametric methods are seen by some statisticians as leaving
less room for improper use and misunderstanding.

The wider applicability and increased robustness of non-parametric tests comes at a cost: in cases where a
parametric test would be appropriate, non-parametric tests have less power. In other words, a larger sample
size can be required to draw conclusions with the same degree of confidence.

Non-parametric models
Non-parametric models differ from parametric models in that the model structure is not specified a priori
but is instead determined from data. The term non-parametric is not meant to imply that such models
completely lack parameters but that the number and nature of the parameters are flexible and not fixed in
advance.

A histogram is a simple nonparametric estimate of a probability distribution.


Kernel density estimation is another method to estimate a probability distribution.
Nonparametric regression and semiparametric regression methods have been developed
based on kernels, splines, and wavelets.
Data envelopment analysis provides efficiency coefficients similar to those obtained by
multivariate analysis without any distributional assumption.
KNNs classify the unseen instance based on the K points in the training set which are
nearest to it.
A support vector machine (with a Gaussian kernel) is a nonparametric large-margin
classifier.
The method of moments with polynomial probability distributions.

Methods
Non-parametric (or distribution-free) inferential statistical methods are mathematical procedures for
statistical hypothesis testing which, unlike parametric statistics, make no assumptions about the probability
distributions of the variables being assessed. The most frequently used tests include

Analysis of similarities
Anderson–Darling test: tests whether a sample is drawn from a given distribution
Statistical bootstrap methods: estimates the accuracy/sampling distribution of a statistic
Cochran's Q: tests whether k treatments in randomized block designs with 0/1 outcomes
have identical effects
Cohen's kappa: measures inter-rater agreement for categorical items
Friedman two-way analysis of variance by ranks: tests whether k treatments in randomized
block designs have identical effects
Empirical likelihood
Kaplan–Meier: estimates the survival function from lifetime data, modeling censoring
Kendall's tau: measures statistical dependence between two variables
Kendall's W: a measure between 0 and 1 of inter-rater agreement
Kolmogorov–Smirnov test: tests whether a sample is drawn from a given distribution, or
whether two samples are drawn from the same distribution
Kruskal–Wallis one-way analysis of variance by ranks: tests whether > 2 independent
samples are drawn from the same distribution
Kuiper's test: tests whether a sample is drawn from a given distribution, sensitive to cyclic
variations such as day of the week
Logrank test: compares survival distributions of two right-skewed, censored samples
Mann–Whitney U or Wilcoxon rank sum test: tests whether two samples are drawn from the
same distribution, as compared to a given alternative hypothesis.
McNemar's test: tests whether, in 2 × 2 contingency tables with a dichotomous trait and
matched pairs of subjects, row and column marginal frequencies are equal
Median test: tests whether two samples are drawn from distributions with equal medians
Pitman's permutation test: a statistical significance test that yields exact p values by
examining all possible rearrangements of labels
Rank products: detects differentially expressed genes in replicated microarray experiments
Siegel–Tukey test: tests for differences in scale between two groups
Sign test: tests whether matched pair samples are drawn from distributions with equal
medians
Spearman's rank correlation coefficient: measures statistical dependence between two
variables using a monotonic function
Squared ranks test: tests equality of variances in two or more samples
Tukey–Duckworth test: tests equality of two distributions by using ranks
Wald–Wolfowitz runs test: tests whether the elements of a sequence are mutually
independent/random
Wilcoxon signed-rank test: tests whether matched pair samples are drawn from populations
with different mean ranks

History
Early nonparametric statistics include the median (13th century or earlier, use in estimation by Edward
Wright, 1599; see Median § History) and the sign test by John Arbuthnot (1710) in analyzing the human
sex ratio at birth (see Sign test § History).[3][4]

See also
CDF-based nonparametric confidence interval
Parametric statistics
Resampling (statistics)
Semiparametric model

Notes
1. Pearce, J; Derrick, B (2019). "Preliminary testing: The devil of statistics?" (https://doi.org/10.3
1273%2Freinvention.v12i2.339). Reinvention: An International Journal of Undergraduate
Research. 12 (2). doi:10.31273/reinvention.v12i2.339 (https://doi.org/10.31273%2Freinventi
on.v12i2.339).
2. Stuart A., Ord J.K, Arnold S. (1999), Kendall's Advanced Theory of Statistics: Volume 2A—
Classical Inference and the Linear Model, sixth edition, §20.2–20.3 (Arnold).
3. Conover, W.J. (1999), "Chapter 3.4: The Sign Test", Practical Nonparametric Statistics
(Third ed.), Wiley, pp. 157–176, ISBN 0-471-16068-7
4. Sprent, P. (1989), Applied Nonparametric Statistical Methods (Second ed.), Chapman &
Hall, ISBN 0-412-44980-3

General references
Bagdonavicius, V., Kruopis, J., Nikulin, M.S. (2011). "Non-parametric tests for complete
data", ISTE & WILEY: London & Hoboken. ISBN 978-1-84821-269-5.
Corder, G. W.; Foreman, D. I. (2014). Nonparametric Statistics: A Step-by-Step Approach.
Wiley. ISBN 978-1118840313.
Gibbons, Jean Dickinson; Chakraborti, Subhabrata (2003). Nonparametric Statistical
Inference, 4th Ed. CRC Press. ISBN 0-8247-4052-1.
Hettmansperger, T. P.; McKean, J. W. (1998). Robust Nonparametric Statistical Methods.
Kendall's Library of Statistics. Vol. 5 (First ed.). London: Edward Arnold. New York: John
Wiley & Sons. ISBN 0-340-54937-8. MR 1604954 (https://mathscinet.ams.org/mathscinet-ge
titem?mr=1604954). also ISBN 0-471-19479-4.
Hollander M., Wolfe D.A., Chicken E. (2014). Nonparametric Statistical Methods, John Wiley
& Sons.
Sheskin, David J. (2003) Handbook of Parametric and Nonparametric Statistical
Procedures. CRC Press. ISBN 1-58488-440-1
Wasserman, Larry (2007). All of Nonparametric Statistics, Springer. ISBN 0-387-25145-6.
Retrieved from "https://en.wikipedia.org/w/index.php?title=Nonparametric_statistics&oldid=1164568209"

You might also like