0% found this document useful (0 votes)
41 views

STATISTICS AND PROBABILITY Unit 1-2

1. The document discusses key concepts in statistics and probability including probability distributions, hypothesis testing, and statistical methods. Probability distributions describe how probabilities are distributed over possible values of random variables. Hypothesis testing involves testing a null hypothesis against an alternative hypothesis using statistical methods and sample data. 2. Common hypothesis tests discussed are the z-test, for large sample sizes, and the t-test, for small sample sizes. Formulas are provided for conducting one-sample and two-sample z-tests and t-tests. 3. Statistical methods of bivariate analysis mentioned include correlation, regression analysis, and cross tabulation for analyzing the relationship between two variables.

Uploaded by

theresaperez298
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views

STATISTICS AND PROBABILITY Unit 1-2

1. The document discusses key concepts in statistics and probability including probability distributions, hypothesis testing, and statistical methods. Probability distributions describe how probabilities are distributed over possible values of random variables. Hypothesis testing involves testing a null hypothesis against an alternative hypothesis using statistical methods and sample data. 2. Common hypothesis tests discussed are the z-test, for large sample sizes, and the t-test, for small sample sizes. Formulas are provided for conducting one-sample and two-sample z-tests and t-tests. 3. Statistical methods of bivariate analysis mentioned include correlation, regression analysis, and cross tabulation for analyzing the relationship between two variables.

Uploaded by

theresaperez298
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

STATISTICS AND PROBABILITY

Probability Distribution, Hypothesis Testing, and Statistical Methods and Tools Used in Research

1 PROBABILITY DISTRIBUTION

In Statistics and Probability, a probability distribution is the mathematical function that


gives the probabilities of occurrence of different possible outcomes for an experiment. To define
this in a specific case of random variable (set of possible values from a random experiment), the
probability distribution for a random variable describes how the probabilities are distributed over
the values of the random variable. Specifically, random variables can be discrete or continuous.

Table 1.1 Difference between Discrete Random Variable and Continuous Random Variable
Discrete Random Variable Continuous Random Variable
A random variable is a discrete random
variable if the set of possible outcomes is A random variable is a continuous random
countable or no other outcome exists between variable if it takes on values on continuous
two consecutive outcomes, such as (1, 2, 3, 4, scale such as heights, weights, and
5). temperature.

Therefore, probability distributions of random variables are either discrete probability


distribution or continuous probability distribution.
 A discrete distribution describes the probability of occurrence of each value of a discrete
random variable.
 A continuous distribution describes the probability of the possible values of a continuous
random variable.

2 HYPOTHESIS TESTING

A hypothesis is a proposed explanation, assertion, or assumption about a particular


parameter or about the distribution of a random variable. It is tested using statistical methods,
generally using experimental samples. All analysts use a random population sample to test to
different hypotheses: the null hypothesis and the alternative hypothesis.

Table 2.1 Difference between Null Hypothesis and Alternative Hypothesis


Null Hypothesis (H0) Alternative Hypothesis (Ha)
It is a statement of “zero” difference. It assumes
It is a statement that assumes there is a
that there is no significant difference between
significant difference between the two means
the population mean and the sample mean, or
or variables under test or investigation.
variables being compared.

1
How hypothesis testing works?

Steps of Hypothesis Testing:


1. State the null and alternative hypotheses so that only one can be true.
2. Formulate an analysis plan which outlines how the data will be evaluated.
3. Carry out the plan and analyze the sample data.
4. Analyze the results, and either reject the null hypothesis or state that the null hypothesis is
plausible, given the data.

3 Z-TEST AND T-TEST

The z-test is a test used to investigate large sample sizes (n > 30) and assess whether two
means or proportions differ significantly. The data are assumed to come from a normal population
whose variance is known.
On the other hand, t-test is used when n < 30 and only the sample standard deviation is
given as a basis for the estimation of the population standard deviation. If the sample size is small,
the sampling distribution of the sample mean and the standard deviation is no longer an
approximate of the normal standard deviation. Therefore, there is a need to use the t-distribution
table to form a conclusion.
Table 3.1 Notations in z-test and t-test and their definitions
NOTATION DEFINITION
μ Population Mean
x Sample Mean
σ Population standard deviation
s Sample standard deviation
n Sample size
s
2
Sample variance
a Significance level
df Degree of freedom

Z-TEST FORMULAS USED IN HYPOTHESIS TESTING


1. For one-sample group
 Standard score of the sample mean using the population mean:
x−μ
z= √n
σ

2. For two-sample groups with sizes n1∧n2


 Two sample means with n1=n2=n :

Hypothesis testing is a statistical process of decision-making for evaluating claims about a


population based on the characteristics of a sample coming from that population.
2
x 1−x 2 x1− x2
z= ∨z=

√ √
2 2 2 2
s 1 s2 s1 + s2
+
n1 n2 n
3. Two sample means with n1 ≠ n2 :

4. Two population means with n1 ≠ n2 :


μ1−μ2
z=

√ √
2 2
(n1−1) σ 1 +(n 2−1) σ 2 1 1
• +
n1 +n2 −2 n 1 n2

T-TEST FORMULAS USED IN HYPOTHESIS TESTING

1. One-sample t-test:

t= ( x−μs ) √n , where df =n−1


2. Two-sample t-test:
 Two sample means with n1=n2∨n1 ≠ n2 with unequal variance
x 1−x 2
t=


s 1 s 2 , where df =n1 +n2−2
2 2
+
n1 n 2
 Two sample means with n1=n2=n with equal variance
x 1−x 2
t=


s1 + s2 , where df =2 n−2
2 2

n1
 Two population means with n1=n2
μ1 −μ 2
t=


σ 1 + σ 2 , where df =n1 +n2−2
2 2

n1
 Two sample means with n1=n2∨n1 ≠ n2 with unequal variance
x 1−x 2
t=

√ √
1 1 , where df =n1 +n2−2
2 2
(n1−1) s1 +(n2−1)s 2
• +
n1+ n2−2 n 1 n2
 Two population means with n1 ≠ n2
μ1−μ2
t=

√ √
1 1 , where df =n1 +n2−2
2 2
(n1−1) σ 1+(n 2−1) σ 2
• +
n1 +n2−2 x −x n1 n2
1 2
z=

√ √
2 2
(n1−1) s 1+(n2−1) s 2 1 1
• +
n1+ n2−2 n1 n2
3
4 STATISTICAL METHODS AND TOOLS USED IN RESEARCH

Statistics is a term that pertains to the acts of collecting and analyzing numerical data.
Doing statistics means performing some arithmetic procedures such as addition, subtraction,
multiplication, and other mathematical calculations. In connection with this, statistical methods are
the ways of gathering, analyzing, and interpreting variables or numerical data.
Statistical Methods of Bivariate Analysis
Bivariate analysis is one of the types of statistical analysis of variables in quantitative
research. It refers to the analysis of two variables which are independent and dependent variables.
Bivariate analysis happens by means of the following methods:

2. Cross Tabulation – By displaying the frequency and percentage distribution of the data,
cross tabulation explains the reason behind the relationship of two variables and the
effect of one variable on the other variable.

Measure of Correlation
1. Correlation Coefficient – This is a measure of the strength and direction of the linear
relationship between variables and likewise gives the extent of dependence between two
variables. This is determined through the following statistical tests:

 Spearman’s Rank Correlation Coefficient (Spearman’s rho) – the test to measure the
dependence of the dependent variable on the independent variable.

6∑ di
2
ρ= 2
n (n −1)
where
d i is the difference between the two ranks of each observation
n is the number of observations

 Pearson product-moment correlation (Pearson’s r) – measures the strength and


direction of the linear relationship of the two variables and of the association between
interval and ordinal values.

r=
∑ ( x i−x )( y i− y )
√ ∑ ( x i−x )2 ∑ ( y i− y ) 2
where
x i is the values of the x-variable in a sample
x is the mean of the values of the x-variable
y i is the values of the y-variable in a sample
y is the mean of the values of the y-variable

1. Correlation – describes the relationship between the two variables and also tests the
strength or significance of their linear relation.
4
 Chi-square – is the statistical test for bivariate analysis for nominal variables,
specifically, to test the null hypothesis. It tests whether or not a relationship exists
between or among variables and tells the probability that the relationship is caused
by chance.

For a non 2x2 table:


2
(O−E)
x =∑
2
E

For a 2x2 table (chi-square formula with Yate’s correction factor):

(|O−E|−0.5 )2
x 2= ∑
E

where
O is the observed frequency
E is the expected frequency
 t-test – evaluates the probability that the mean of the sample reflects the mean of the
population from where the sample was drawn. ANOVA or analysis of variance also
uses t-test to determine the variance or the difference between the predicted number
of the sample and the actual measurement. The ANOVA is of various types such as the
following:
a) One-way ANOVA – study of the effects of the independent variable
MST
F=
MSE

( )
k 22
Ti G
∑ ni

n
i=1
MST =
k−1

( )
k ni k 2
Ti
∑∑ Y 2
ij −¿ ∑
ni
i=1 j=1 i=1
MSE= ¿
n−k

where
F is the variance ratio for the overall test
MST is the mean square due to treatments/groups (between groups)
MSE is the mean square due to error (within groups, residual mean square)
Y ij is an observation
T i is a group total
G is the grand total of all observation
ni is the number in group i
n is the total number of observations

5
b) ANCOVA (Analysis of Covariation) – study of two or more dependent variables
that are correlated with one another
c) MANCOVA (Multiple Analysis of Covariation) – multiple analyses of one or
more independent variables and one dependent variable to see if the
independent variables affect one another

2. Regression – it determines the existence of variable relationships, but does more than by
determining the following: (1) which between the independent and dependent variable can
signal the presence of another variable; (2) how strong the relationship between the two
variables are; and (3) when an independent variable is statistically significant as a
predictor.

LINEAR REGRESSION and FORMULA FOR LINEAR REGRESSION


The linear regression is an approach to determine the relationship between a scalar
variable Y and one or more explanatory variables denoted as X.
Y =mX + b
where m is the slope and b is the y-intercept.
m=n¿ ¿

b= y−m x
where y is the mean of Y and x is the mean of X.

You might also like