0% found this document useful (0 votes)
0 views

Unit I

The document provides an overview of statistical inference, detailing the differences between parametric and non-parametric tests, as well as the concepts of random sampling and the central limit theorem. It explains hypothesis testing, including the null and alternative hypotheses, significance levels, and the use of z-tests and t-tests. Additionally, it discusses the assumptions and applications of these tests, along with the relationship between levels of significance and p-values.

Uploaded by

panwarsakshi2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

Unit I

The document provides an overview of statistical inference, detailing the differences between parametric and non-parametric tests, as well as the concepts of random sampling and the central limit theorem. It explains hypothesis testing, including the null and alternative hypotheses, significance levels, and the use of z-tests and t-tests. Additionally, it discusses the assumptions and applications of these tests, along with the relationship between levels of significance and p-values.

Uploaded by

panwarsakshi2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Unit I

Basic aim of statistical inference is to form a conclusion about a population


parameter from a sample (statistic) taken from that population.
A population is the complete set of observations about which an investigator
wishes to draw conclusions. A sample is a part of that population.
A population is defined in terms of observations rather than people.
A population is defined by the interest of the investigator.

Parametric Tests
A statistical test, in which specific assumptions are made about the
population parameter
Parametric tests assumes that the data follow a specific distribution, usually
a normal distribution; homogeneity of variances (equal variances across
groups); and that the data are measured on an interval or ratio scale.
The observation must be independent. The inclusion or exclusion of any
case in the sample should not unduly affect the results of study
The meaningfulness of the results of a parametric test depends on the
validity of the assumption.
Parametric tests are sensitive to outliers and non-normality, which can affect
the validity of the results.
Parametric tests are useful as these tests are most powerful for testing the
significance of the computed sample statistics.
Examples: t-test (compares means between two groups), ANOVA (compares
means across multiple groups)

Non-Parametric Tests
They are distribution-free tests. Statistical tests which are not based on a
normal distribution of data or on any other assumption.
The test statistic is arbitrary and can be applied to data measured on ordinal
or nominal scales, as well as interval and ratio scales.
Results are interpreted based on ranks or medians.
Non-parametric test are less sensitive to outliers and can be used for
skewed distributions and small sample sizes.
The use of non-parametric tests is recommended in the following situations:
Sample size is quite small, as small as N=5 or N=6
Assumption like normality of the distribution of scores in the population
are doubtful
When the measurement of data is available either in the form of ordinal
or nominal scales or when the data can be expressed in the form of
ranks
Non-parametric tests typically make fewer assumptions about the data and
may be relevant to a particular situation
Examples: Spearman's rank correlation (measures the strength and direction
of the association between two ranked variables), chi-square test (tests the
association between categorical variables)

Z-test

Random Sampling
A random sample of a given population is a sample so drawn that each
possible sample of that size has an equal probability of being selected from
the population.
The method of selection, not the particular sample outcome, defines a
random sample.
There are two sampling plans that yield a random sample:
sampling with replacement- in which an element may appear more
than once in a sample
sampling without replacement- in which no element may appear more
than once
A random sampling distribution of the mean is the relative frequency
distribution of mean (X-bar) obtained from all possible random samples of a
given size that could be drawn from a given population.
Characteristics of the Random Sampling Distribution of the
Mean
Expected Value of the sample mean, is the same as the mean of the
population of scores from which the samples were drawn.

The standard deviation of the random sampling distribution of the mean,


called the standard error of the mean, depends on the standard deviation of
the population, 𝜎X, and the sample size, n.

If the population of scores is normally distributed, the sampling distribution


of the mean will also be normally distributed, regardless of sample size.

Central Limit Theorem


States that~ the random sampling distribution of the mean tends toward a
normal distribution irrespective of the shape of the population of
observations sampled; the approximation to the normal distribution improves
as sample size increases

The central limit theorem also states that the sampling distribution will have
the following properties:
The mean of the sampling distribution will be equal to the mean of the

population distribution:
The variance of the sampling distribution will be equal to the variance of

the population distribution divided by the sample size:


The central limit theorem states that the sampling distribution of the mean
will always follow a normal distribution under the following conditions:
The sample size is sufficiently large. This condition is usually met if the
sample size is n ≥ 30.
The samples are independent and identically distributed random
variables. This condition is usually met if the sampling is random.
The population’s distribution has finite variance. Central limit theorem
doesn’t apply to distributions with infinite variance, such as the Cauchy
distribution.
Practical applications of the central limit theorem:
Quality control: Monitoring manufacturing processes.
Economics: Analyzing average income, expenditure, etc.
Finance: Modeling stock returns and risks.

Testing Hypothesis
Hypothesis is a statement about a population parameter to be subjected to
test and, on the outcome of the test, to be retained or rejected.
The key to any problem in statistical inference is to discover what sample
values will occur by chance in repeated sampling and with what probability.
Sampling Distribution: a theoretical relative frequency distribution of the
values of a statistic that would be obtained by chance from an infinite
number of samples of a particular size drawn from a given population.
Probability Samples: samples for which the probability of inclusion in the
sample of each element in the population is known.
The goal of hypothesis testing is to make inferences about a population
based on a sample.

Null Hypothesis
The hypothesis that a researcher tests is called the null hypothesis
Symbolized Ho. It is the hypothesis that he or she will decide to retain or
reject.
The null hypothesis is simply whatever hypothesis we choose to test.
Null hypothesis implies a statement that expects no difference or effect.
Level of Significance

In Statistics, “significance” means “not by chance” or “probably true”.


The probability value that is used as a criterion to decide that an obtained
sample statistic (X-bar) has a low probability of occurring by chance if the
null hypothesis is true (resulting in rejection of the null hypothesis)
It defines whether the null hypothesis is assumed to be accepted or
rejected.
𝜶 (alpha) symbol for the level of significance
The level of significance is generally chosen as 0.01 or 0.05

Conclusion

Region of Rejection the portion of the sampling distribution of Mean


(consisting of values of mean that are unlikely to have occurred by chance if
Ho is true) that leads to rejection of Ho.
Region of Retention the portion of the sampling distribution of Mean that
leads to retention of Ho.
Critical value(s) the value(s) that separates the region of rejection from the
region of retention
To determine the position of the obtained X-bar, it must be expressed as a z
score:

Rejecting the null hypothesis- the obtained sample statistic (X-bar) has a
low probability of occurring by chance if the value of the population
parameter stated in Ho is true
Retaining the null hypothesis- we do not have sufficient evidence to reject
Ho
Large samples increase the precision by reducing sampling variation.

Alternate Hypothesis
Alternative hypothesis- a hypothesis about a population parameter that
contradicts the null hypothesis
Ha symbol for the alternative hypothesis
Alternative hypothesis is one that expects some difference or effect.
The alternative hypothesis may be directional or non-directional
The time to decide on the nature of the alternative hypothesis is at the
beginning of the study, before the data are collected.

One-Tailed (Directional) Hypothesis


The alternative hypothesis states that the population parameter differs from
the value stated in Ho in one particular direction (and the critical region is
located in only one tail of the sampling distribution)
A directional alternative hypothesis is appropriate only when there is no
practical difference in meaning between retaining the null hypothesis and
concluding that a difference exists in a direction opposite to that stated in
the directional alternative hypothesis.

Two-Tailed (Non- Directional) Hypothesis


The alternative hypothesis states that the population parameter may be
either less than or greater than the value stated in Ho (and the critical region
is divided between both tails of the sampling distribution)

Assumptions of Z-test
A random sample has been drawn from the population. This ensures that
each member of the population has an equal chance of being included in the
sample, which helps in making the sample representative of the population.
The sample has been drawn by the with-replacement sampling plan.
The sampling distribution of Mean follows the normal curve. When the
scores in the population are not normally distributed, the central limit
theorem comes to the rescue when the sample size is 30 or larger
The standard deviation of the population of scores is known.

The observations in the sample must be independent of each other. This


means that the value of one observation should not influence the value of
another observation.
t-test for single mean (Deviational Formula)
Unbiased Estimator- the mean of the estimates made from all possible
samples of the same size equals the value of the parameter estimated (X-
bar is an unbiased estimator of 𝜇x; Sx is not an unbiased estimator of 𝜎x)

Estimated Standard Error of the Mean- an estimate of the standard


deviation of the random sampling distribution of means.
t-test (Raw Score Method)

Student's Distribution of t

Because of the presence of the variable in the denominator, this statistic


does not follow the normal distribution.
British mathematician William S. Gosset presented the proper distribution for
it which has been referred to as Student’s distribution of t.
Student’s distribution of t- a theoretical relative frequency distribution of all
the values of Means converted to t that would be obtained by chance from
an infinite number of samples of a particular size drawn from a given
population

Characteristics
Student’s distribution of t is not a single distribution, but rather a family of
distributions. They differ in their degree of approximation to the normal
curve
The mean of the t-distribution is zero.
Are symmetrical.
Are unimodal.
Is platykurtic compared to the normal distribution (i.e., it is narrower at the
peak and has a greater concentration in the tails than does a normal curve)
The shape of the t-distribution depends on the degrees of freedom, which
are typically related to the sample size, df = n - 1
As the degrees of freedom increase, the t-distribution approaches the
normal distribution. For df > 30, the t-distribution is very close to the normal
distribution.
Has a larger standard deviation (𝜎Z = 1)
As the degrees of freedom increase, the standard deviation approaches 1,
which is the standard deviation of the standard normal distribution.

Assumptions of t-test
The t-test is a statistical test used to compare the means of two groups

The data should be collected using a random sampling method. This


ensures that the sample is representative of the population.
The data should be approximately normally distributed. This assumption is
particularly important for small sample sizes (typically n < 30)
Independence: For independent two-sample t-tests, the samples should be
independent of each other.
The standard deviation of the population of scores is unknown.
Equal Variances: For independent two-sample t-tests, the variances of the
two populations should be equal (homogeneity of variances).
The scale of measurement applied to the data collected follows a
continuous or ordinal scale
Paired Samples: For paired-sample t-tests, the observations should be
paired and dependent.

Differences and Similarities between z and t


The z-test and t-test are both statistical methods used for hypothesis testing,
particularly for comparing means between two groups or for testing the
significance of a sample mean against a known or hypothesized population mean.
Similarities:

Both tests are used to assess whether there is a significant difference


between the means of two groups or between a sample mean and a
population mean.
Both tests are parametric, meaning they make assumptions about the
underlying distribution of the data to make inferences.
Both tests generate a test statistic that is used to determine the probability
(p-value) of observing the sample data if the null hypothesis is true.
All data points of both the test are independent.

Differences:

In t-test, standard deviation of the population is unknown whereas in z-test,


the standard deviation of the population is known.
The t-test is based on Student’s t-distribution. On the contrary, z-test relies
on the assumption that the distribution of sample means is normal.
z-test is used to when the sample size is large, i.e. n > 30, and t-test is
appropriate when the size of the sample is small, in the sense that n < 30.

Degree of Freedom
The degrees of freedom associated with s, the estimated standard deviation
of a population, corresponds to the number of observations that are
completely free to vary.
df= n − 1
The degrees of freedom of a test statistic determines the critical value of the
hypothesis test.

statistical conclusion- a conclusion about the numerical property of the


data (reject or retain Ho)
research conclusion- a conclusion about the subject matter
Levels of Significance versus p-values
Level of Significance
The level of significance, denoted by (alpha), is the threshold set by the
researcher to determine when to reject the null hypothesis.
It represents the probability of committing a Type I error, which is rejecting
the null hypothesis when it is actually true.
Typical values for (\alpha) are 0.05, 0.01, and 0.1.
For example, an (\alpha) of 0.05 indicates a 5% risk of concluding that a
difference exists when there is no actual difference.
Before conducting a test, the researcher decides on the level of significance.
If the p-value obtained from the test is less than or equal to (\alpha), the null
hypothesis is rejected.

P-value
the probability, when Ho is true, of observing a sample mean as deviant or
more deviant (in the direction specified in HA) than the obtained value of
mean
It measures the strength of evidence against the null hypothesis.
It is not established in advance and is not a statement of risk; it simply
describes the rarity of the sample outcome if Ho is true.
If the p-value is less than or equal to the level of significance, reject the null
hypothesis. This suggests that the observed effect is statistically significant.
If the p-value is greater than alpha, do not reject the null hypothesis. This
suggests that there is not enough evidence to conclude that the effect is
statistically significant.
If the p-value is smaller than your level of significance, you declare your
findings significant.

You might also like