What Is A Chi-Square Statistic?
What Is A Chi-Square Statistic?
Roll no 5041
__________________________”_”__”__”””””””””””_______________&
Qno. 1
How chi square and t test used in analytical chemistry to accept or reject the result?
Ans.
Chi-square tests are often used in hypothesis testing. The chi-square statistic compares the size
any discrepancies between the expected results and the actual results, given the size of the
sample and the number of variables in the relationship. For these tests, degrees of freedom are
utilized to determine if a certain null hypothesis can be rejected based on the total number of
variables and samples within the experiment. As with any statistic, the larger the sample size, the
more reliable the results.
KEY TAKEAWAYS
A chi-square (χ2) statistic is a measure of the difference between the observed and expected
frequencies of the outcomes of a set of events or variables.
χ2 depends on the size of the difference between actual and observed values, the degrees of
freedom, and the samples size.
χ2 can be used to test whether two variables are related or independent from one another or to
test the goodness-of-fit between an observed distribution and a theoretical distribution of
frequencies.
The Formula for Chi-Square Is
\begin{aligned}&\chi^2_c = \sum \frac{(O_i - E_i)^2}{E_i}
\\&\textbf{where:}\\&c=\text{Degrees of freedom}\\&O=\text{Observed
value(s)}\\&E=\text{Expected value(s)}\end{aligned}χc2=∑Ei(Oi−Ei)2
where:c=Degrees of freedomO=Observed value(s)E=Expected value(s)
If there is no relationship between sex and course selection (that is, if they are independent), then
the actual frequencies at which male and female students select each offered course should be
expected to be approximately equal, or conversely, the proportion of male and female students in
any selected course should be approximately equal to the proportion of male and female students
in the sample. A χ2 test for independence can tell us how likely it is that random chance can
explain any observed difference between the actual frequencies in the data and these theoretical
expectations.
Goodness-of-Fit
χ2 provides a way to test how well a sample of data matches the (known or assumed)
characteristics of the larger population that the sample is intended to represent. If the sample data
do not fit the expected properties of the population that we are interested in, then we would not
want to use this sample to draw conclusions about the larger population.
For example consider an imaginary coin with exactly 50/50 chance of landing heads or tails and
a real coin that you toss 100 times. If this real coin has an is fair, then it will also have an equal
probability of landing on either side, and the expected result of tossing the coin 100 times is that
heads will come up 50 times and tails will come up 50 times. In this case, χ2 can tell us how well
the actual results of 100 coin flips compare to the theoretical model that a fair coin will give
50/50 results. The actual toss could come up 50/50, or 60/40, or even 90/10. The farther away the
actual results of the 100 tosses is from 50/50, the less good the fit of this set of tosses is to the
theoretical expectation of 50/50 and the more likely we might conclude that this coin is not
actually a fair coin.
Related Terms
Goodness-Of-Fit
A goodness-of-fit test helps you see if your sample data is accurate or somehow
skewed. Discover how the popular chi-square goodness-of-fit test works.
more
Degrees of Freedom Definition
Degrees of Freedom refers to the maximum number of logically independent
values, which are values that have the freedom to vary, in the data sample.
more
T-Test Definition
A t-test is a type of inferential statistic used to determine if there is a significant
difference between the means of two groups, which may be related in certain
features.
more
Nonparametric Method
Nonparametric method refers to a type of statistic that does not require that the
data being analyzed meet certain assumptions or parameters.
more
How Analysis of Variance (ANOVA) Works
Analysis of variance (ANOVA) is a statistical analysis tool that separates the total
variability found within a data set into two components: random and systematic
factors.
more
Z-Test Definition
Z-test is a statistical test used to determine whether two population means are
different when the variances are known and the sample size is large.
more
What Is a T-Test?
A t-test is a type of inferential statistic used to determine if there is a significant difference
between the means of two groups, which may be related in certain features. It is mostly used
when the data sets, like the data set recorded as the outcome from flipping a coin 100 times,
would follow a normal distribution and may have unknown variances. A t-test is used as a
hypothesis testing tool, which allows testing of an assumption applicable to a population.
KEY TAKEAWAYS
Mathematically, the t-test takes a sample from each of the two sets and establishes the problem
statement by assuming a null hypothesis that the two means are equal. Based on the applicable
formulas, certain values are calculated and compared against the standard values, and the
assumed null hypothesis is accepted or rejected accordingly.
If the null hypothesis qualifies to be rejected, it indicates that data readings are strong and are
probably not due to chance. The t-test is just one of many tests used for this purpose. Statisticians
must additionally use tests other than the t-test to examine more variables and tests with larger
sample sizes. For a large sample size, statisticians use a z-test. Other testing options include the
chi-square test and the f-test.
There are three types of t-tests, and they are categorized as dependent and independent t-tests.
After the drug trial, the members of the placebo-fed control group reported an increase in
average life expectancy of three years, while the members of the group who are prescribed the
new drug report an increase in average life expectancy of four years. Instant observation may
indicate that the drug is indeed working as the results are better for the group using the drug.
However, it is also possible that the observation may be due to a chance occurrence, especially a
surprising piece of luck. A t-test is useful to conclude if the results are actually correct and
applicable to the entire population.
In a school, 100 students in class A scored an average of 85% with a standard deviation of 3%.
Another 100 students belonging to class B scored an average of 87% with a standard deviation of
4%. While the average of class B is better than that of class A, it may not be correct to jump to
the conclusion that the overall performance of students in class B is better than that of students in
class A. This is because there is natural variability in the test scores in both classes, so the
difference could be due to chance alone. A t-test can help to determine whether one class fared
better than the other.
T-Test Assumptions
1. The first assumption made regarding t-tests concerns the scale of measurement. The
assumption for a t-test is that the scale of measurement applied to the data collected follows a
continuous or ordinal scale, such as the scores for an IQ test.
2. The second assumption made is that of a simple random sample, that the data is collected from
a representative, randomly selected portion of the total population.
3. The third assumption is the data, when plotted, results in a normal distribution, bell-shaped
distribution curve.
4. The final assumption is the homogeneity of variance. Homogeneous, or equal, variance exists
when the standard deviations of samples are approximately equal.
Calculating T-Tests
Calculating a t-test requires three key data values. They include the difference between the mean
values from each data set (called the mean difference), the standard deviation of each group, and
the number of data values of each group.
The outcome of the t-test produces the t-value. This calculated t-value is then compared against a
value obtained from a critical value table (called the T-Distribution Table). This comparison
helps to determine the effect of chance alone on the difference, and whether the difference is
outside that chance range. The t-test questions whether the difference between the groups
represents a true difference in the study or if it is possibly a meaningless random difference.
T-Distribution Tables
The T-Distribution Table is available in one-tail and two-tails formats. The former is used for
assessing cases which have a fixed value or range with a clear direction (positive or negative).
For instance, what is the probability of output value remaining below -3, or getting more than
seven when rolling a pair of dice? The latter is used for range bound analysis, such as asking if
the coordinates fall between -2 and +2.
The calculations can be performed with standard software programs that support the necessary
statistical functions, like those found in MS Excel.
This method also applies to cases where the samples are related in some manner or have
matching characteristics, like a comparative analysis involving children, parents or siblings.
Correlated or paired t-tests are of a dependent type, as these involve cases where the two sets of
samples are related.
The formula for computing the t-value and degrees of freedom for a paired t-test is:
\begin{aligned}&T=\frac{\textit{mean}1 - \textit{mean}2}
{\frac{s(\text{diff})}
{\sqrt{(n)}}}\\&\textbf{where:}\\&\textit{mean}1\text{ and }\textit{mean}2=\t
ext{The average values of each of the sample sets}\\&s(\text{diff})=\text{The
standard deviation of the differences of the paired data values}\\&n=\text{The
sample size (the number of paired differences)}\\&n-1=\text{The degrees of
freedom}\end{aligned}T=(n)s(diff)mean1−mean2where:mean1 and mean2=The a
verage values of each of the sample setss(diff)=The standard deviation of the dif
ferences of the paired data valuesn=The sample size (the number of paired diffe
rences)n−1=The degrees of freedom
The remaining two types belong to the independent t-tests. The samples of these types are
selected independent of each other—that is, the data sets in the two groups don’t refer to the
same values. They include cases like a group of 100 patients being split into two sets of 50
patients each. One of the groups becomes the control group and is given a placebo, while the
other group receives the prescribed treatment. This constitutes two independent sample groups
which are unpaired with each other.
Set 1 Set 2
19.7 28.3
20.4 26.7
19.6 20.1
17.8 23.3
18.5 25.2
18.9 22.1
18.3 17.7
18.9 27.6
19.5 20.6
21.9
13.7
5
23.2
17.5
20.6
18
23.9
21.6
24.3
20.4
23.9
13.3
Though the mean of Set 2 is higher than that of Set 1, we cannot conclude that the population
corresponding to Set 2 has a higher mean than the population corresponding to Set 1. Is the
difference from 19.4 to 21.6 due to chance alone, or do differences really exist in the overall
populations of all the paintings received in the art gallery? We establish the problem by
assuming the null hypothesis that the mean is the same between the two sample sets and conduct
a t-test to test if the hypothesis is plausible.
Since the number of data records is different (n1 = 10 and n2 = 20) and the variance is also
different, the t-value and degrees of freedom are computed for the above data set using the
formula mentioned in the Unequal Variance T-Test section.
The t-value is -2.24787. Since the minus sign can be ignored when comparing the two t-values,
the computed value is 2.24787.
The degrees of freedom value is 24.38 and is reduced to 24, owing to the formula definition
requiring rounding down of the value to the least possible integer value.
One can specify a level of probability (alpha level, level of significance, p) as a criterion for
acceptance. In most cases, a 5% value can be assumed.
Using the degree of freedom value as 24 and a 5% level of significance, a look at the t-value
distribution table gives a value of 2.064. Comparing this value against the computed value of
2.247 indicates that the calculated t-value is greater than the table value at a significance level of
5%. Therefore, it is safe to reject the null hypothesis that there is no difference between means.
The population set has intrinsic differences, and they are not by chance.
Compare Accounts
Advertiser Disclosure
Related Terms
Confidence Interval Definition
A confidence interval, in statistics, refers to the probability that a population
parameter will fall between two set values.
more
Z-Test Definition
Z-test is a statistical test used to determine whether two population means are
different when the variances are known and the sample size is large.
more
How Analysis of Variance (ANOVA) Works
Analysis of variance (ANOVA) is a statistical analysis tool that separates the total
variability found within a data set into two components: random and systematic
factors.
more
Standard Deviation
The standard deviation is a statistic that measures the dispersion of a dataset
relative to its mean. It is calculated as the square root of variance by determining
the variation between each data point relative to the mean.
more
Goodness-Of-Fit
A goodness-of-fit test helps you see if your sample data is accurate or somehow
skewed. Discover how the popular chi-square goodness-of-fit test works.
more
Two-Tailed Test Definition
A two-tailed test is the statistical testing of whether a distribution is two-sided and
if a sample is greater than or less than a range of values.