0% found this document useful (0 votes)
48 views

Chapter-Summary of Bacal

The document discusses key concepts in probability and statistics including: 1) Random variables can be discrete or continuous depending on whether their possible values are countable or occur on a continuous scale. 2) Probability distributions specify the possible values of a random variable and their probabilities. Important properties include the mean, variance, and standard deviation. 3) The normal distribution is a continuous probability distribution that is symmetric and bell-shaped. It is used extensively in statistics. 4) Sampling distributions describe statistics computed from samples, such as the distribution of sample means. The central limit theorem states that sampling distributions will be approximately normal for large sample sizes. 5) Estimation involves using sample statistics like the

Uploaded by

Tammy Ogoy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

Chapter-Summary of Bacal

The document discusses key concepts in probability and statistics including: 1) Random variables can be discrete or continuous depending on whether their possible values are countable or occur on a continuous scale. 2) Probability distributions specify the possible values of a random variable and their probabilities. Important properties include the mean, variance, and standard deviation. 3) The normal distribution is a continuous probability distribution that is symmetric and bell-shaped. It is used extensively in statistics. 4) Sampling distributions describe statistics computed from samples, such as the distribution of sample means. The central limit theorem states that sampling distributions will be approximately normal for large sample sizes. 5) Estimation involves using sample statistics like the

Uploaded by

Tammy Ogoy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Chapter I

RANDOM VARIABLES AND PROBABILITY DISTRIBUTION

• A random variable is a function that associates a real number with each element is the sample space.
It is a variable whose values are determined by chance.
• A random variable is a discrete random variable if its set of possible outcomes is countable. Mostly,
discrete random variables represent count data, such as the number of defective chairs produced in
a factory.
• A random variable is a continuous random variable if it takes on values on a continuous scale. Often,
continuous random variables represent measured data, such as height, weight, and temperature.
• A discrete probability distribution or a probability mass function consists of the values a random
variable can assume and the corresponding probabilities of the values.

• Properties of a Probability Distribution


1. The probability of each value of the random variable must be between 0 and 1 or equal to 0
or 1. In symbol, we write it as 0 ≤ 𝑃(𝑋) ≤ 1.
2. The sum of the probabilities of all values of the random variable must be equal to 1. In symbol,
we write it as ∑𝑃(𝑋) = 1.

• Formula for the Mean of the Probability Distribution


The mean of a random variable with a discrete probability distribution is
𝜇 = X1 • P(X1) + X2 • P(X2) + X3 • P(X3) + . . . , + Xn • P(Xn) or
𝜇 = ∑𝑋 • 𝑃(𝑋)
where:
X1, X2, X3, . . . , Xn are the values of the random variable X, and
P(X1), P(X2), P(X3), . . . , P(Xn) are the corresponding probabilities.

• Steps in Finding the Variance and Standard Deviation


1. Find the mean of the probability distribution.
2. Subtract the mean from each value of the random variable X.
3. Square the results obtained in Step 2.
4. Multiply the results obtained in Step 3 by the corresponding probability.
5. Get the sum of the results obtained in Step 4.

• Formula for the Variance and Standard Deviation of a Discrete Probability Distribution
The variance of a discrete probability distribution is given by the formula:
𝜎 2 = ∑(𝑋 − 𝜇)2 • 𝑃(𝑋)
The standard deviation of a discrete probability distribution is given by the formula:

𝜎 = √∑(𝑋 − 𝜇)2 • 𝑃(𝑋)


where: X = value of the random variable
P(X) = probability of the random variable X
• Alternative Formula for the Variance and Standard Deviation of a Discrete Probability
Distribution
The variance of a discrete probability distribution is given by the formula:
𝜎 2 = ∑𝑋 2 • 𝑃(𝑋 ) − 𝜇2
The standard deviation of a discrete probability distribution is given by the formula:

𝜎 = √∑𝑋 2 • 𝑃(𝑋 ) − 𝜇2
where:
X = value of the random variable
P(X) = probability of the random variable X
𝜇 = mean of the probability distribution
Chapter II
NORMAL DISTRIBUTION

• A standard normal curve is a normal probability distribution that has a mean 𝜇 = 0 and a standard
deviation, 𝑠 = 1.

• The normal probability distribution has the following properties:


1. The distribution curve is bell-shaped.
2. The curve is symmetrical about its center.
3. The mean, the median, and the mode coincide at the center.
4. The width of the curve is determined by the standard deviation of the distribution.
5. The tails of the curve flatten out indefinitely along the horizontal axis, always approaching
the axis but never touching it. That is, the curve is asymptotic to the base line.
6. The area under the curve is 1. Thus, it represents the probability or proportion or the
percentage associated with specific sets of measurement values.

• Useful computing formulas:


𝑋−𝜇
1. 𝑧 =
𝜎
2. 𝑋 = 𝜇 + 𝑧𝜎 for computing a raw score above the mean.
3. 𝑋 = 𝜇 + (−𝑧)𝜎 for computing a raw score below the mean.
Chapter III
SAMPLING AND SAMPLING DISTRIBUTION

• Descriptive measures computed from a population are called parameters while descriptive measures
computed from a sample are called statistics.
• The number of samples of size n that can be drawn from a population of size N is given by the NCn.
• A sampling distribution of sample means is a frequency distribution using the means computed from
all possible random samples of a specific size taken from a population.
• The probability distribution of the sample means is also called the sampling distribution of the sample
means.
• The standard deviation of sampling distribution of the sample means is also known as the standard
error of the mean. It measures the degree of accuracy of the sample mean (𝑋̅) as an estimate of the
population mean (𝜇).

• Steps in Constructing the Sampling Distribution of the Means


1. Determine the number of possible samples that can be drawn from the population using the
formula:

NCn
Where N = size of the population
n = size of the sample
2. List all the possible samples and compute the mean of each sample.
3. Construct a frequency distribution of the sample means obtained in Step 2.

• Properties of the Sampling Distribution of Sample Mean


If all possible samples of size n are drawn from a population of size N with mean 𝜇 and
variance 𝜎 2, then the sampling distribution of the sample means has the following properties:
1. The mean of the sampling distribution of the sample means is equal to the population mean
𝜇. That is,
𝜇𝑋̅ = 𝜇
2. The variance of the sampling distribution of the sample means is given by:
2 𝜎 2 𝑁−𝑛
• 𝜎 𝑋̅ = • 𝑁−1 for finite population; and
𝑛
2 𝜎2
• 𝜎 𝑋̅ = for infinite population.
𝑛
3. The standard deviation of the sampling distribution of the sample means is given by:
𝜎 𝑁−𝑛
a. 𝜎𝑋̅ = • √𝑁−1 for finite population
√𝑛
𝑁−𝑛
where √ is the finite population correction factor
𝑁−1
𝜎
b. 𝜎𝑋̅ = for infinite population
√𝑛
• The Central Limit Theorem
If random samples of size n are drawn from a population, then as n becomes larger, the
sampling distribution of the means approaches the normal distribution, regardless of the
shape of the population distribution.
Chapter IV
ESTIMATION OF PARAMETERS

• An estimate is a value that approximates a parameter. It is based on sample statistics computed from
sample data.
• In Statistics, estimation is the process of determining values parameters.
• The confidence level of an interval estimate of a parameter is the probability that the interval
estimate contains the parameter. It describes what percentage of intervals from many different
samples contains the unknown population parameter.
∑(𝑋−𝑋̅)2
• Variance (s2): 𝑠 2 =
𝑛−1
∑(𝑋−𝑋̅)2
• Standard deviation (s): 𝑠 =√ 𝑛−1
• An interval estimate, called a confidence interval, is a range of values that is used to estimate a
parameter. This estimate may or may not contain the true parameter value.
• The confidence level of an interval estimate of a parameter is the probability that the interval
estimate contains the parameter. It describes what percentage of intervals from many different
samples contains the unknown population parameter.
• General formula for confidence intervals for large samples:
𝜎 𝜎
𝑋̅ − 𝑧𝛼 ( 𝑛) < 𝜇 < 𝑋̅ + 𝑧𝛼 ( 𝑛).
2 √ 2 √
• Computing formula for error E
𝜎 𝑠
𝐸 = 𝑧𝛼 ( ) ≈ 𝑧𝛼 ( ).
2 √𝑛 2 √𝑛

• A Four-Step Process in Computing the Interval Estimate


Step 1. Describe the population parameter of interest (e.g., mean 𝜇).
Step 2. Specify the confidence interval criteria.
a. Check the assumptions.
b. Determine the test statistics to be used.
c. State the level of confidence.
Step 3. Collect and present sample evidence.
a. Collect the sample information.
b. Find the point estimate.
Step 4. Determine the confidence interval.
a. Determine the confidence coefficients (for a z or a t-distribution).
b. Find the maximum error E of the estimate.
c. Find the lower and the upper confidence limits.
d. Describe / interpret the results.
s
• ̅ ± 𝐭(
The general expression for the confidence interval when 𝜎 is unknown is given by X ) and
√n
the distribution of values is called t-distribution.
s
• The formula for computing the confidence interval using t-distribution is ̅
X − 𝐭 ( n) < 𝜇 < ̅
X+

s
𝐭 ( n)

• Proportion is a fraction expression where the favorable response is in the numerator and the total
number of respondents is in the denominator. The basic operation involves division. Thus, the result
is a decimal value that can be expressed as percent.
• The formula for computing a large sample confidence a large sample confidence interval for a
population proportion 𝑝 is:

𝑝̂ 𝑞̂ 𝑝̂ 𝑞̂
𝑝̂ − 𝑧𝛼√ 𝑛 < 𝑝 < 𝑝̂ + 𝑧𝛼√ 𝑛
2 2
• Formula in determining the minimum sample size needed when estimation the population mean:
𝑧𝛼 • 𝜎 2
2
𝑛=( )
𝐸
• Formula in determining the minimum sample size needed when estimation the population
proportion:
𝑧𝛼 2
2
𝑛 = 𝑝̂ 𝑞̂ ( )
𝐸
Chapter V
CONDUCTING HYPOTHESIS TESTING

• Hypothesis testing is a decision-making process for evaluation claims about a population based on
the characteristics of a sample purportedly coming from that population. The decision is whether the
characteristic is acceptable or not.
• The null hypothesis, denoted by 𝑯𝟎 , is a statement that there is no difference between a parameter
and a specific value, or that there is no difference between two parameters.
• The alternative hypothesis, denoted by 𝑯𝟏 is a statement that there is a difference between a
parameter and a specific value, or that there is a difference between two parameters.
• When the alternative hypothesis utilizes the ≠ symbol, the test is said to be non-directional.
• When the alternative hypothesis utilizes the > or the < symbol, the test is said to be directional.
• A non-directional test is also called a two-tailed test.
• A directional test may either be left-tailed or right-tailed.
• The rejection region refers to the region where the value of the test statistic lies for which we will
reject the null hypothesis. This region is also called critical region
• The probability of committing a Type 1 error is called significance level of a test
• For any hypothesis test, p-value = probability of committing a Type 1 error.
𝑋̅−𝜇 𝜎
• Test statistic: 𝑧 = where: 𝜎𝑋̅ =
𝜎𝑋
̅ √𝑛
• Under the normal curve, the rejection region refers to the region where the value of the test statistic
lies for which we will reject the null hypothesis. This region is also called critical region.
• If the test is one-tailed, the p-value is equal to the area beyond z is the same direction as the
alternative hypothesis.
• Decision rule for the p-value approach:
o Reject H0 if 𝑝 ≤ 𝛼
o Do not reject if H0 if 𝑝 > 𝛼.
Sample proportion−Null hypothesized proportion
• Test statistic: z =
Standard deviation of sample proportion
𝑝̂−𝑝0 𝑝̂−𝑝0
z= 𝜎𝑝
modified into z= 𝑝0 𝑞0
̂ √ 𝑛
o For a one-tailed test:
H0: 𝑝 = 𝑝0
H1: 𝑝 > 𝑝0 and the rejection region is 𝑧 > +𝑧𝛼
Or (H1: p < p0) and the rejection region is 𝑧 < −𝑧𝛼
o For a two-tailed test:
H0: p = p0
H1: p ≠ p0
The rejection region is 𝑧 < −𝑧𝛼/2or is 𝑧 > 𝑧𝛼/2 .
Chapter VI
CORRELATION AND REGRESSION ANALYSIS

• Dependent and Independent Variables


1. Two variables that come in pair are called bivariate data.
2. If two variables are related, the one that depends upon the order is the dependent variable. The
order, one is independent variable. To identify which of the two variables is dependent or
independent, we place each variable in the blank space in the sentence.
“_______________ depends upon _______________”
and we evaluate whether the variables are logically placed in the sentence.

• Scatterplot
1. A scatterplot is the point-graph of all the scores taken from bivariate data.
2. The trend line is the line “closest to the points”.
3. In interpreting a scatterplot, the strength and direction of association is considered.
✓ If the points are arranged in a “thinner” line the strength of association is greater.
✓ If the trend of points are arranged in a line that points to the right, the direction of association
is positive; if it points to the left, the direction is negative.
✓ When the direction of association between two bivariate data is positive, it means that when
the value of one variable is high, the other variable is also high or when the value of one is
low, the other is also low.
✓ When the direction is negative, it means that if the value of variable is low, the other is high
or vice-versa.

• Pearson r
1. The Pearson r is the most commonly used statistic measure the strength of correlation or
association between two variables. To compute r, we use the formula:
𝑛 ∑ 𝑋𝑌 − ∑ 𝑋 • ∑ 𝑌
r=
√[𝑛√𝑋 2 − (∑ 𝑋)2 ][𝑛√𝑌 2 − (∑ 𝑌)2 ]
2. The value of the computed r ranges from -1 to + 1. If the value is negative, the direction of
correlation is negative. We conclude that the two variables are negatively correlated. If the value
is positive, when the two variables are positively correlated. If the value of r is zero, then there
is no correlation between the two variables.
The closer the value of r is to ±1, then the stronger the correlation is; the closer the value is to
0, the weaker the correlation is.
• Regression and Prediction
1. If two variables are significantly correlated, we can predict the value of the dependent variable if
we know the value of the independent variable. The process is called regression analysis.
2. In regression analysis, the goal is to determine the regression line that is the same as the trend
line. The regression line is used in prediction; hence it is sometimes called the predictor.
3. The regression line is the same as the point-slope form equation of a line in algebra. The
regression line is:
Y = bX + a, where 𝑏 is the slope of the line and 𝑎 is the y-intercept.
(∑ 𝑌)(∑ 𝑋 2 ) − (∑ 𝑋)(∑ 𝑋𝑌)
a=
𝑛(∑ 𝑋 2 ) − (∑ 𝑋 )2
𝑛(∑ 𝑋𝑌) − (∑ 𝑋)(∑ 𝑋𝑌)
b=
𝑛(∑ 𝑋 2 ) − (∑ 𝑋)2
GLOSSARY

Arithmetic average. In a set of numerical data, this is the value obtained by adding the number of cases and then dividing the
sum by the total number of cases.
Central Limit Theorem. Given a random sample of n observations selected from a population with mean and standard deviation.
Then, when n is sufficiently large, the sampling distribution of means, will be approximately a normal distribution with mean
and the standard deviation. The larger the sample size, the closer the sampling distribution of means to the normal
distribution will be.
Confidence coefficient. This is a number that is used in determining an estimate of a population parameter. It is also known as
critical value. It is usually readily available from a table.
Confidence interval. It is a range of values that purportedly contains a population parameter.
Confidence level. This is the confidence statement regarding the interval estimate of the population parameter. It is the
probability that the interval estimate contains the parameter.
Confidence limits. These are the lower and the upper values in a confidence interval. Also known as confidence boundaries.
Continuous random variable. It takes on values on a continuous scale. Often, continuous random variables represent measured
data.
Critical values. These are the confidence coefficients.
Degrees of freedom (df). This is a number of values that are free to vary after a sample statistics has been computed.
Discrete random variable. The set of possible outcomes is countable.
Error. This refers to the difference between an observed value and a parameter.
Estimate. In inferential statistics, it is a value that approximates a population parameter.
Estimation. This is an area of inferential statistics where population values are determined by utilizing standard statistical
procedures.
Hypothesis testing. This is an area of inferential statistics consisting of standard procedures in decision-making for evaluating
claims about a population based on the characteristics of a sample, or samples, purportedly coming from that population.
Interval estimate. This is a range of values that may contain the parameter of a population.
Margin of Error (E). This is the maximum likely difference between the observed sample mean and the true value of the
population mean.
Mean. In a set of data, this is a measure of central tendency or location. This value is used to represent the entire set of data.
Parameter. This is a population value usually denoted by Greek letters.
Population. This is the set of all people, objects, events, or ideas one wish to investigate.
Point estimate. This is a specific numerical value of a population parameter.
Proportion. In frequencies obtained from surveys, it is fraction expression where the favorable response is in the numerator
and the total number or respondents is in the denominator. In general, it is a number obtained when we divide an observed
frequency of a subset by the total number of cases in the set.
P approach. This is a decision-making process where the null hypothesis is evaluated by assuming it to be true and then test the
reasonableness of this assumption by calculating the probability of getting the results if chance alone is operating.
P-value. This is the obtained probability value when utilizing the p approach.
Random sample. This is a sample obtained from target population utilizing random sampling techniques.
Random variable. A function that associates a real number to each element in the sample space.
Sample. This is a subgroup of a target population.
Significant difference. This is the difference between two, or among, the values that are too big to be ignored for decision-
making. Marginal differences are usually ignored and their occurrences are attributed to chance factors operating.
Standard deviation. This is a measure of dispersion, or spread, in a given set of numerical data.
Standard normal curve. This is a normal probability distribution that has a mean of 0 and a standard deviation of 1.
Statistic. This is a value obtained from a sample data.
Statistical hypothesis. This is an assertion or a conjecture about one or more populations.
Test statistic. This refers to the statistical value that is appropriate for a particular analysis. This value is the result of applying a
specific formula.
T-distribution. This is a distribution of values less than 30.
T-table. This table provides the confidence coefficients or critical values when the t-test statistic is applied.
T-value. This is the statistic resulting from the application of a t-test statistic.
Z-table. The table provides the confidence coefficients or critical values when the z-test statistic is applied.
Z-value. This is the statistic resulting from the application of z-test statistic.

You might also like