0% found this document useful (0 votes)
11 views21 pages

Define the null hypothesis (no difference between sample and theoretical distribution) and the alternative hypothesis (difference exists).

This presentation covers various statistical tests, including small sample tests, t-tests, F-tests, chi-square tests, and their applications in data analysis. It emphasizes the importance of understanding the assumptions and conditions for each test, such as normality for small sample tests and independence for chi-square tests. Practical examples illustrate how to perform these tests and interpret their results for informed decision-making.

Uploaded by

tanmaybotttt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views21 pages

Define the null hypothesis (no difference between sample and theoretical distribution) and the alternative hypothesis (difference exists).

This presentation covers various statistical tests, including small sample tests, t-tests, F-tests, chi-square tests, and their applications in data analysis. It emphasizes the importance of understanding the assumptions and conditions for each test, such as normality for small sample tests and independence for chi-square tests. Practical examples illustrate how to perform these tests and interpret their results for informed decision-making.

Uploaded by

tanmaybotttt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Probability and

Statistics: Tests
& Examples
This presentation explores various statistical tests used to analyse
data and draw conclusions. We'll discuss the fundamentals of
each test, its applications, and relevant examples.

BY - GROUP 3
Small Sample Test
A small sample test refers to statistical tests used to analyze data when the sample size is relatively small,
typically less than 30 observations.

Key Points of Small


Sample Tests:
Assumption of Normality:
• Many small sample tests assume that the data follows a normal distribution. This assumption is critical because
the accuracy of the test results depends on it.
t-Distribution:
• When sample sizes are small, the t-distribution is used instead of the normal distribution. The t-distribution is
similar to the normal distribution but has thicker tails, which accounts for the increased variability in small
samples.
Student's t-test
Student’s T-Distribution is probability distribution that is used to calculate
population parameters when the sample size is small and when population
variance(sigma) is unknown.

One-sample t-test
Tests if the mean of a sample is significantly different from a known
population mean.
Example: Comparing the average height of students in a class to
the national average.
Two-sample t-test
Tests if the means of two independent samples are significantly
different.
Example: Testing if there's a difference in the average performance
of two groups of students on a test.
Paired t-test
Tests if the means of two related samples are significantly different
Suppose a teacher wants to know if a new teaching method has
improved students' test scores. The teacher administers a test
before and after the new teaching method to the same group of 5
students..
Practical Example: Comparing Two Study Methods
Imagine you're a teacher and you want to find out if two different study methods lead to
different exam scores for your students. You divide your class into two groups:

Group A uses Study Method A (flashcards).


Group B uses Study Method B (interactive videos).
After a month, both groups take the same exam. Here are their scores:

Group A: 78, 85, 92, 88, 76


Group B: 81, 79, 85, 77, 73
You want to know if the difference in their average scores is significant or if it could have
happened by chance.
Perform an Independent t-test
Calculate the Means (Averages):
Mean of Group A: (78 + 85 + 92 + 88 + 76) / 5 = 83.8
Mean of Group B: (81 + 79 + 85 + 77 + 73) / 5 = 79.0

Understand the Variability:


Variability tells us how spread out the scores are. This is calculated using variance and standard deviation, but let's keep it simple and say
that it measures the differences in individual scores from the average score.

Calculate the t-value:

Degrees of Freedom (df):


df = (number of scores in Group A + number of scores in Group B) - 2
df = 5 + 5 - 2 = 8

Use a t-distribution Table:


Look up the critical t-value for df = 8 and a common significance level (0.05). Let's say the critical t-value is around 2.306.

Compare Calculated t-value to Critical t-value:- Suppose our calculated t-value is 2.7.
Significant Result: If our calculated t-value (2.7) is greater than the critical t-value (2.306), we have a significant result. This means the
difference in average scores between the two groups is unlikely to be due to chance.
Not Significant Result: If our calculated t-value was less than 2.306, it would mean that the difference might be due to random chance.
F-test
The F-test compares the variances of two populations. It's used to
determine if the variances are statistically significantly different.

Two-Sample F-Test for Variances


A two-sample F-test is used to compare two population variances
sigma_{1} ^ 2 and sigma_{2} ^ 2 when a sample is
randomly selected from each population. The populations must be
independent and normally distributed. The test statistic is
F = (s_{1} ^ 2)/(s_{2} ^ 2) where s_{1} ^ 2 and s_{2} ^ 2
represent the sample variances with s_{1} ^ 2 >= s_{2} ^ 2 .

Degrees of freedom for the numerator is d.f * N = n_{1} - 1 and


the degrees of freedom for the denominator is d.f .D = n_{2} - 1
where n_{1} is the size of the sample having variance s_{1} ^ 2
and n_{2} is the size of the sample having variance s_{2} ^ 2
Chi-Square Test
• A statistical test used to determine if there is a significant
association between two categorical variables.
• Compares observed frequencies with expected
frequencies.

Key
Conditions:
• Categorical Data: The data used in the chi-square test should
be categorical, meaning it falls into distinct categories or groups.
• Independence: Observations in the data should be
independent of each other. This means that one observation
should not influence another.
• Expected Frequencies: The expected frequency in each cell
of the contingency table should be at least 5. This condition is
crucial for the accuracy of the chi-square distribution
approximation.
χ² = Σ [(O - E)² /
Where:
E]
• χ² is the chi-square statistic
• O is the observed frequency
• E is the expected frequency
• Σ is the sum of all categories

E = (Row Total * Column Total) / Grand


Total
Steps to Calculate Expected
Frequency:
• Create a contingency table: Organize your data into a table with
rows and columns representing the categories of your two variables.
• Calculate row and column totals: Sum the values within each row
and column.
• Calculate the grand total: Sum all values in the table.
Real-Life Use of Chi-Square Test:
Healthcare
Example: Smoking and Lung
Cancer
Problem :A researcher wants to investigate if there is a
relationship between smoking and lung cancer. They collect data
on a group of people, classifying them based on smoking status
(smoker, non-smoker) and health outcome (lung cancer, no lung
cancer).

Categorical variables: Smoking status, lung cancer status.


Chi-square test: By comparing the observed frequencies of lung
cancer cases in smokers and non-smokers to the expected
frequencies (assuming no relationship), the researcher can
determine if there is a significant association between smoking
and lung cancer.

A significant chi-square test result would provide evidence


to support the hypothesis that smoking increases the risk
of lung cancer.
Goodness of Fit Test :
• Chi-Square Goodness-of-Fit Test : A statistical test used to determine if a sample
data matches an expected distribution.

• Purpose : To assess whether the observed frequencies differ significantly from the
expected frequencies under a specific hypothesis.

A chi-square (Χ^2) goodness of fit test is a goodness of fit test for a categorical variable
. Goodness of fit is a measure of how well a statistical model fits a set of observations.

• When goodness of fit is high, the values expected based on the model are close to
the observed values.

• When goodness of fit is low, the values expected based on the model are far from
the observed values.
Example :
You’re hired by a dog food company to help them test three new dog food flavors.
You recruit a random sample of 75 dogs and offer each dog a choice between the
three flavors by placing bowls in front of them. You expect that the flavors will be
equally popular among the dogs, with about 25 dogs choosing each flavor.
Once you have your experimental results, you plan to use a chi-square goodness of
fit test to figure out whether the distribution of the dogs’ flavor choices is
significantly different from your expectations.
• Null hypothesis (H0) : The dog population chooses the three flavors in equal
proportions (p1 = p2 = p3).

• Alternative hypothesis (Ha) : The dog population does not choose the three
flavors in equal proportions.

X^2 : Chi-Square Statistic


O : Observed Values
E : Expected Values
Find the critical chi-square value in a chi-square critical value table. The critical value
is calculated from a chi-square distribution. To find the critical chi-square value, you’ll
need to know two things:

• The Degrees of Freedom (df) : For chi-square goodness of fit tests, the df is the
number of groups minus one.

• Significance level (α): By convention, the significance level is usually 0.05 .

Since there are three groups (Garlic Blast, Blueberry Delight, and Minty Munch),
there are two degrees of freedom.
For a test of significance at α = 0.05 and df = 2 , the Χ^2 critical value is 5.99 .

Result : X^2 Value is less than the Critical Value .


• If the Χ^2 value is greater than the critical value, then the difference between
the observed and expected distributions is statistically significant (p < α).
⚬ The data allows you to reject the null hypothesis and provides support for the
alternative hypothesis.

• If the Χ^2 value is less than the critical value, then the difference between the
observed and expected distributions is not statistically significant (p > α).
⚬ The data doesn’t allow you to reject the null hypothesis and doesn’t provide
support for the alternative hypothesis.

The Χ^2 value is less than the critical value. Therefore, you should not reject the null
hypothesis that the dog population chooses the three flavors in equal proportions.
There is no significant difference between the observed and expected flavor choice
distribution (p > .05). This suggests that the dog food flavors are equally popular in
the dog population
Independence of Attributes
This test determines if two categorical variables are independent of each other. It assesses whether there is a statistically
significant association between the variables.

Null Hypothesis Alternative Analysis


The variables are independent. There Hypothesis
The variables are dependent. There is Compare observed frequencies to
is no association between them. an association between them. expected frequencies using the chi-
square statistic.
Examples: Independence of Attributes
A researcher wants to study the relationship between gender and opinion on a political issue.

Step 1 Step 2 Step 3 Step 4


Formulate a hypothesis. Collect data through a Calculate the expected Compare the observed
The null hypothesis is that survey and create a frequencies for each cell in frequencies to the expected
there is no relationship contingency table showing the table under the frequencies using the Chi-
between gender and the observed frequencies assumption of square statistic. Determine
opinion on the issue. The for each combination of independence. This is done the p-value associated with
alternative hypothesis is gender and opinion. by multiplying the row and the calculated Chi-square
that there is a relationship. column totals and dividing statistic.
by the grand total.
Conclusion
The tests discussed provide powerful tools for analysing data and
drawing meaningful conclusions. Understanding their applications
and limitations is crucial for making informed decisions in various
fields.
Team Members
Aman Goswami (23BAI10294)

Mohit Upadhyay (23BAI10262)

Sia Upadhyay (23BAI10301)

Avinash Kushwaha (23BAI10237)

Ritvik Kaul (23BAI10097)

Anshuman Parmar (23BAI10258)

Kislay Anand (23BAI10359)

You might also like