STATISTICS AND PROBABILITY Unit 1-2
STATISTICS AND PROBABILITY Unit 1-2
Probability Distribution, Hypothesis Testing, and Statistical Methods and Tools Used in Research
1 PROBABILITY DISTRIBUTION
Table 1.1 Difference between Discrete Random Variable and Continuous Random Variable
Discrete Random Variable Continuous Random Variable
A random variable is a discrete random
variable if the set of possible outcomes is A random variable is a continuous random
countable or no other outcome exists between variable if it takes on values on continuous
two consecutive outcomes, such as (1, 2, 3, 4, scale such as heights, weights, and
5). temperature.
2 HYPOTHESIS TESTING
1
How hypothesis testing works?
The z-test is a test used to investigate large sample sizes (n > 30) and assess whether two
means or proportions differ significantly. The data are assumed to come from a normal population
whose variance is known.
On the other hand, t-test is used when n < 30 and only the sample standard deviation is
given as a basis for the estimation of the population standard deviation. If the sample size is small,
the sampling distribution of the sample mean and the standard deviation is no longer an
approximate of the normal standard deviation. Therefore, there is a need to use the t-distribution
table to form a conclusion.
Table 3.1 Notations in z-test and t-test and their definitions
NOTATION DEFINITION
μ Population Mean
x Sample Mean
σ Population standard deviation
s Sample standard deviation
n Sample size
s
2
Sample variance
a Significance level
df Degree of freedom
√ √
2 2 2 2
s 1 s2 s1 + s2
+
n1 n2 n
3. Two sample means with n1 ≠ n2 :
√ √
2 2
(n1−1) σ 1 +(n 2−1) σ 2 1 1
• +
n1 +n2 −2 n 1 n2
1. One-sample t-test:
√
s 1 s 2 , where df =n1 +n2−2
2 2
+
n1 n 2
Two sample means with n1=n2=n with equal variance
x 1−x 2
t=
√
s1 + s2 , where df =2 n−2
2 2
n1
Two population means with n1=n2
μ1 −μ 2
t=
√
σ 1 + σ 2 , where df =n1 +n2−2
2 2
n1
Two sample means with n1=n2∨n1 ≠ n2 with unequal variance
x 1−x 2
t=
√ √
1 1 , where df =n1 +n2−2
2 2
(n1−1) s1 +(n2−1)s 2
• +
n1+ n2−2 n 1 n2
Two population means with n1 ≠ n2
μ1−μ2
t=
√ √
1 1 , where df =n1 +n2−2
2 2
(n1−1) σ 1+(n 2−1) σ 2
• +
n1 +n2−2 x −x n1 n2
1 2
z=
√ √
2 2
(n1−1) s 1+(n2−1) s 2 1 1
• +
n1+ n2−2 n1 n2
3
4 STATISTICAL METHODS AND TOOLS USED IN RESEARCH
Statistics is a term that pertains to the acts of collecting and analyzing numerical data.
Doing statistics means performing some arithmetic procedures such as addition, subtraction,
multiplication, and other mathematical calculations. In connection with this, statistical methods are
the ways of gathering, analyzing, and interpreting variables or numerical data.
Statistical Methods of Bivariate Analysis
Bivariate analysis is one of the types of statistical analysis of variables in quantitative
research. It refers to the analysis of two variables which are independent and dependent variables.
Bivariate analysis happens by means of the following methods:
2. Cross Tabulation – By displaying the frequency and percentage distribution of the data,
cross tabulation explains the reason behind the relationship of two variables and the
effect of one variable on the other variable.
Measure of Correlation
1. Correlation Coefficient – This is a measure of the strength and direction of the linear
relationship between variables and likewise gives the extent of dependence between two
variables. This is determined through the following statistical tests:
Spearman’s Rank Correlation Coefficient (Spearman’s rho) – the test to measure the
dependence of the dependent variable on the independent variable.
6∑ di
2
ρ= 2
n (n −1)
where
d i is the difference between the two ranks of each observation
n is the number of observations
r=
∑ ( x i−x )( y i− y )
√ ∑ ( x i−x )2 ∑ ( y i− y ) 2
where
x i is the values of the x-variable in a sample
x is the mean of the values of the x-variable
y i is the values of the y-variable in a sample
y is the mean of the values of the y-variable
1. Correlation – describes the relationship between the two variables and also tests the
strength or significance of their linear relation.
4
Chi-square – is the statistical test for bivariate analysis for nominal variables,
specifically, to test the null hypothesis. It tests whether or not a relationship exists
between or among variables and tells the probability that the relationship is caused
by chance.
(|O−E|−0.5 )2
x 2= ∑
E
where
O is the observed frequency
E is the expected frequency
t-test – evaluates the probability that the mean of the sample reflects the mean of the
population from where the sample was drawn. ANOVA or analysis of variance also
uses t-test to determine the variance or the difference between the predicted number
of the sample and the actual measurement. The ANOVA is of various types such as the
following:
a) One-way ANOVA – study of the effects of the independent variable
MST
F=
MSE
( )
k 22
Ti G
∑ ni
−
n
i=1
MST =
k−1
( )
k ni k 2
Ti
∑∑ Y 2
ij −¿ ∑
ni
i=1 j=1 i=1
MSE= ¿
n−k
where
F is the variance ratio for the overall test
MST is the mean square due to treatments/groups (between groups)
MSE is the mean square due to error (within groups, residual mean square)
Y ij is an observation
T i is a group total
G is the grand total of all observation
ni is the number in group i
n is the total number of observations
5
b) ANCOVA (Analysis of Covariation) – study of two or more dependent variables
that are correlated with one another
c) MANCOVA (Multiple Analysis of Covariation) – multiple analyses of one or
more independent variables and one dependent variable to see if the
independent variables affect one another
2. Regression – it determines the existence of variable relationships, but does more than by
determining the following: (1) which between the independent and dependent variable can
signal the presence of another variable; (2) how strong the relationship between the two
variables are; and (3) when an independent variable is statistically significant as a
predictor.
b= y−m x
where y is the mean of Y and x is the mean of X.