0% found this document useful (0 votes)
51 views

The Most Important Probability Distribution in Statistics

The normal distribution is the most important probability distribution in statistics. It is bell-shaped and symmetrical, with the average in the center. There are different types of normal distributions based on their peaks and tails. The normal distribution is used to calculate probabilities, convert scores to z-scores, and understand sampling distributions and hypothesis testing. Hypothesis testing involves formulating the null and alternative hypotheses, selecting a statistical test, setting the significance level, and determining whether to reject the null hypothesis.

Uploaded by

Missy Cabangal
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views

The Most Important Probability Distribution in Statistics

The normal distribution is the most important probability distribution in statistics. It is bell-shaped and symmetrical, with the average in the center. There are different types of normal distributions based on their peaks and tails. The normal distribution is used to calculate probabilities, convert scores to z-scores, and understand sampling distributions and hypothesis testing. Hypothesis testing involves formulating the null and alternative hypotheses, selecting a statistical test, setting the significance level, and determining whether to reject the null hypothesis.

Uploaded by

Missy Cabangal
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 57

The Most Important

Probability Distribution in
Statistics
 Which is more, normal/common or rare?
 Where can we find the average among the
two?
Normal distributions are bell shaped, and
symmetrical around 

Normal

rare rare
 Leptokurtic – high peak, narrow tail distribution
 Mesokurtic – average peaks
 Platykurtic – flat peak, broad tail distribution
 Skewed distributions (may result from outliers)
 negatively skewed
 positively skewed
 Bimodal = having 2 peaks
 Multimodal = having 3 or more peaks
 Probability refers to the likelihood of a
particular event of interest occurring.
 conditional probability
 the probability of a particular event
happening if another event (or set of
conditions) has also happened.
What is the probability of getting
such and such?
 You can convert any score from a sample into
a z-score by subtracting the sample mean
from the score and then dividing by the
sample standard deviation.

(X-mean)/SD

 Z-score can be converted into percentage.


 sampling distribution is a hypothetical
distribution.
 It is where you have selected an infinite
number of samples from a population,
calculated their particular statistic (e.g. a
mean), & plotted all these calculated
statistics as a frequency histogram.
 point estimate
 single figure estimate of an unknown number
 interval estimate
 range within which we think the unknown number will
fall
 Confidence interval
a statistically determined interval estimate of a
population parameter
 standard error
 thestandard deviation of a particular sampling
distribution. In the context of the sampling
distribution of the mean, the standard error is the
standard deviation of all of the sample means.
 Kurtosis & skewness
 Analyze> descriptive statistics > explore
 Check skewness and kurtosis option

 If the statistic is higher than +1 = positively


skewed
 If the statistic is lower than -1 = negatively
skewed
 Any number between +1 & -1 = symmetrical.
 If the standard error multiplied by 3 is
higher than statistic and it is positive, then
it is leptokurtic.
 If the standard error multiplied by 3 is
higher than statistic and it is negative, then
it is platykurtic.
 If the standard error multiplied by 3 is lower
than statistic, then it is mesokurtic.
 A sampling procedure that gives every
element of the population a (known) nonzero
chance of being selected in the sample is
called probability sampling. Otherwise, the
sampling procedure is called non-probability
sampling.
 Whenever possible, probty sampling is used
because there is no objective way of assessing
the reliability of inferences under non-zero
probability sampling.
 1. simple random sampling
 2.systematic sampling
 3.stratified sampling
 4. cluster sampling
 1. accidental or convenience sampling
 2. purposive sampling
 3. quota sampling
 4. snowball or referral sampling
 Estimation
Concept of Estimation
As its name suggests, the objective of
estimation is to determine the approximate
value of a population parameter on the basis
of a sample statistic. An estimator of a
population parameter is a random variable
that is a function of the sample data.
An estimate is the calculation of a specific value of
this random variable. The sample mean is the
estimator of the population mean. The sample
standard deviation is the estimator of the
population standard deviation. Once the sample
has been drawn and the statistics calculated, the
value of the sample mean becomes the estimate
of the population mean, and the value of the
sample standard deviation becomes the estimate
of population standard deviation.
We can use sample data to estimate a population
parameter in two ways. First, we can make a
point estimate. We can compute the value of the
estimator and consider that value as the
estimate of the parameter. We are using one
value or point to infer the value of the
parameter. When drawing inferences about a
population, our intuition tells us that the larger
the sample, the more accurate the results. But,
point estimates don’t reflect the effects of
larger sample sizes.
 A second way of estimating a population
parameter is to make an interval estimate.
We estimate the value of the parameter by
reference to an interval rather than a single
point. The interval estimator is affected by
the sample size.
 Another way of drawing inference about a
population is through hypothesis testing. The
objective of the hypothesis testing is to
determine whether or not the sample data
support some belief or hypothesis about the
population.
1. Formulate the hypothesis
Types of hypotheses:
a. Null Hypothesis (Ho)
b. Alternative hypothesis (Ha)
The null hypothesis is usually one that a
researcher wants to reject. If we reject
the null hypothesis, declare it to be
false, we accept the alternative
hypothesis.
The null hypothesis hypothesizes “no
difference”, or “no effect”, or “no
relationship”. The null hypothesis may also
specify that the value of a parameter of
interest is some value.
The alternative hypothesis is the hypothesis
the researcher wants to prove. The
purpose of the alternative hypothesis is to
determine whether or not the evidence
provided by the sample is enough to
establish that the null hypothesis is not
true. If there is enough such evidence,
then we will say that there is evidence to
support the alternative hypothesis.
 There are several types of null hypothesis
that we will discuss in this course. These may
be categorized according to problem
objective: (1) Description of a Single
Population; (2) Comparison of two
populations; (3) Analysis of the relationship
between two variables
a. If we are interested in determining whether or
not the parameter is not equal to the value of
the parameter specified by the null
hypothesis, the alternative hypothesis will use
the “not equal” sign.
b. If we want to determine whether or not the
sample data allow us to conclude that the
parameter exceeds the value of the parameter
specified by the null hypothesis, the
alternative hypothesis will use the “greater
than” sign.
If we want to determine whether or not the
sample data allows us to conclude that the
parameter is less than the parameter
specified by the null hypothesis, the
alternative hypothesis will use the “less
than” sign.
2. Selecting the Statistical Analysis Model to
be used
Having specified the null and alternative
hypotheses, we then select the
appropriate test statistic or statistical
model to be used. The choice of our
statistic would depend on a number of
factors. (1) the nature of the hypothesis
problem; (2) the level of measurement;
and (3) assumptions of normality.
 The Z-test for a single sample test of the mean
 The T-test for a single sample test of the mean
 The Z-test for a single sample test of proportion
 The Z-test for a difference of means test
 The T-test for a difference of means test
 The Z-test for a difference of proportions test
 Pearson’s product moment correlation coefficient test
 Spearman’s rank correlation coefficient test
 Chi-square test of independence
3. Setting Criteria for rejecting the null
hypothesis

a. Selecting a significance level


b. Determining the area of rejection

Important Terms:
• p-value - probability of obtaining the pattern of
results we found in our study if there was no
relationship between the variables in which we were
interested in the population.
•Alpha is the criterion for statistical significance
that we set for our analyses.
100% 00.001% probability
0 100 percentile
 Type I error is where you decide to reject
the null hypothesis when it is in fact true in
the underlying population.

 A Type II error is where you conclude that


there is no effect in the population when in
reality there is an effect in the population. It
represents the case when you do not reject
the null hypothesis when in fact in you
should do because in the underlying
population the null hypothesis is not true.
1 = Null hypothesis is false
2 = Null hypothesis is true

Hence, we agree to
have 0.05
Alpha Beta
significance level
or rejection level.

2 1

Mean
0.01 0.10
 Between subject design
 Analyze > General Linear Model > Univariate

 Within subject design


 Analyze > General Linear Model > Repeated
Measure
 If the test or measurement gets
repeated, there is 95% chance of
getting the same score.

 If Sig. is lower than 0.05, then


reject Ho.
 If Sig. is higher than 0.05, then
accept Ho.
 The level of significance refers to the
probability of rejecting the null hypothesis
when it is true. This is called a Type I error
or alpha error.
 The level of significance refers to the
probability that we will reject the null
hypothesis. We make the selection of the
level of significance before we compute for
the test statistic.
We need to select a level of significance that we
think is reasonable. The decision as to which
significance level to use depends on the question
involved. Social scientists routinely accept a
probability of .05 for rejecting the null
hypothesis. If a statistical test would lead to
significant policy recommendations, then you
may wish to reduce the risk of being in error and
signify a significance level of .01 or .001.
Based on the significance level we choose,
we then delineate our region of
acceptance and region of rejection. The
region of rejection is also called the
critical region. Outcomes falling here
mean we reject the null hypothesis. Our
critical region will also depend on
whether we are doing a right-tailed test,
a left-tailed test or a two-tailed test.
 Alpha simply gives an indication of the likelihood
of finding such a relationship if the null
hypothesis were true.
 Alpha is the probability that we will get a
relationship of an obtained magnitude if the null
hypothesis were true.
 We do not know what the probability is that the
research hypothesis is correct; our alpha
probability is conditional upon the null
hypothesis being true and has nothing to do with
the truth or falsity of the research hypothesis.
4. Analysis
The analysis part is the process of computing
for our test statistic based on the
assumptions we made and the data we have.
5. Making a decision
In assessing the null hypothesis, we can either
accept the null hypothesis or reject it in favor of
the alternative hypothesis. Our decision will be
based on the value of the test statistic we obtain
in the analysis stage. If the value of the test
statistic is located in the critical region, we
reject the null hypothesis in favor of the
alternative hypothesis. Our findings may be
taken conclusive even if there is the probability
that we may be in the error.
If the test statistic is located in the
acceptance region, we accept the null
hypothesis but our findings are not
conclusive. We simply do not have evidence
to prove our alternative hypothesis.
 This may be compared to a judge who,
upon making a decision, for instance,
hands down a verdict of “guilty”. When
the judge makes this pronouncement, the
defendant is conclusively declared as
guilty and he serves the sentence imposed
even if there is the probability he is
actually not guilty (only God and the
defendant can really say whether he is
guilty or not).
But, when the judge hands down the verdict of
“not guilty”, it usually is not because it has
been proven, beyond reasonable doubt, that
he is not guilty. There simply is not enough
evidence to prove the defendant guilty.
If our alternative hypothesis involves the
greater than sign, we use a right-tailed test.

When our alternative hypothesis involves the


less than sign, we use a left-tailed test.

When our alternative hypothesis involves the


not equal sign, we will use a two-tailed test.
o If hypothesis is suggesting a
positive outcome (e.g. greater,
higher…)
o If hypothesis is suggesting a
negative outcome (e.g. lesser, weaker,
…)
o If hypothesis is suggesting no or
two outcomes (e.g. change…)
 Directional test of hypothesis is
considered if the direction of the
difference is stated. This test makes use
of the one-tailed test or one-sided test.
 The test is considered non-directional if
the direction or the nature of the
difference is not stated. The non-
directional test makes use of the two-
tailed test.
Below are hypothesis stated in different ways.
For each, answer the following questions:
a. In what form is the hypothesis stated?
b. Does use a directional or non-directional
test?
c. Convert the non-directional into a
directional Ha and vice versa.
1. The educational attainment of faculty
members of XXX is not significantly
related to teaching performance.
2. Fourth year students of private schools
perform better in RLE than those from
the public schools.
3. Nurses from private schools are more
competent than those from the
government hospitals.
4. There is a relationship between the life
style and hypertension among males with
ages 40-60 years.
5. There is significant difference between the
academic achievement of male and female
students in XXX.
6. The low scores in review course in nursing
are also low scores in the licensure exam in
nursing.
7. The prevalence of skin diseases among children
is dependent on the kind of habitat where they
live.
8. The different seminars and advertisements on E-
VAT affect the attitude of taxpayers in the
Philippines.
9. The family’s socio-economic status affects the
academic achievement of XXX students.
10. There is no significant relationship between
the amount of tar and the amount of nicotine in
several brands of cigarettes.

You might also like