Statistics and Probability M - PLV TextBook
Statistics and Probability M - PLV TextBook
PROBABILITY
1
1 Statistics and Data
6
Statistics can be defined as a process behind how we
2 Collection of Data make discoveries, make decisions based on data,
and make predictions. The application of Statistics
is very wide for it plays a vital role in every field of
3 Presentation of Data human activity.
For instance, during the pandemic we have all experienced in the year 2020, everyday we can see data
about updates of COVID 19 around the world. This data includes the no. of active cases, recoveries, and
deaths during the pandemic. Through statistics, national governments, health organizations, and
universities were able to make decisions on how to stop and prevent the spread of coronavirus like
imposing community quarantines in which restricted gatherings of people. Also, they were able to make
predictions and set goals during the said pandemic.
In this unit, you will learn the basic concepts of statistics, and how to collect and present data.
Lesson 1 Statistics and Data
Pre-assessment:
At the end of this lesson, you are expected Identify if the variable being described is quantitative or
to: qualitative.
identify the different branches of
Statistics, 1. Monthly income in a household
define sample and population, 2. Beverage preference
distinguish parameter and 3. Degree of agreement
statistic,
4. Learner reference number
illustrate quantitative and
qualitative data,
5. Average score of students in a quiz
distinguish and illustrate the
different levels of measurement.
WHAT IS STATISTICS?
It is the science of collecting, organizing, analyzing, and interpreting data in order to make decisions.
DATA
Collection of facts from experiments, observations, sample surveys and censuses and administrative report
systems.
VARIABLE
A characteristic that is observable or measurable in every unit of the population.
BRANCHES OF STATISTICS
Descriptive Statistics
It is the branch of Statistics that involves the organization, summarization, and display of data.
Inferential Statistics
The branch of Statistics that uses data from samples to make inferences about the population from which
the sample was drawn. In inferential statistics, we use statistics to estimate parameters.
Population
the collection of all outcomes, responses, measurements, or counts that are of interest.
Sample
A subset, or a part, of a population.
Parameter Statistic
It is a numerical measure that It is a numerical measure that
describes characteristics of a describes characteristics of a
population. sample.
Here are some examples of parameters and statistics that we will be using in this module:
Parameter Statistic
Mean µ 𝑥̅
Proportion 𝑝 𝑝̂
Variance 𝜎2 𝑠2
Standard Deviation 𝜎 𝑠
1. The proportion of all patients who recovered from COVID 19 virus for the month of June.
2. The mean difference score between a randomly selected class taught statistics by a new method and another
class by an old method.
3. The mean score of all incoming senior high students of Pamantasan ng Lungsod ng Valenzuela in their
entrance exam.
4. The proportion of voters who resides at Valenzuela among all the voters of the Philippines.
5. The variability of salaries of 10% of the employees in the company.
6. The average height of 100 grade 11 students in PLV.
Answer:
1. Parameter
2. Statistic
3. Parameter
4. Parameter
5. Statistic
6. Statistic
TYPES OF VARIABLES
Discrete – these are measurements that can only be expressed in whole units.
Example:
Continuous – data that can be measured. The possible values are uncountably infinite.
LEVELS OF MEASUREMENT
Nominal – it refers to measurements that serve as labels to identity, items, or classes. It is classified into
categories and cannot be arranged in any particular order.
Example: Student number, color, Music genre, sex
Ordinal – measurements that reflect the rank order of the individuals or objects. It can be arranged in
some order, but the differences between data values cannot be determined or are meaningless. It does
not tell how much one is different from the other.
Example: Social status, hardness of minerals, degrees of agreement
Interval – the values of the variable can be ranked, and the difference of the values show the distances
between the values. It has no true zero point. True zero point refers to the absence of the characteristic.
Example: temperature, test scores
Ratio – it is the highest level of measurement. The differences of the values show the distances between
the values and also the ratio of values is defined, it has a true zero point or absolute zero.
Example: height, age, weight
Lesson 2 Collection of Data
Pre-assessment:
WHAT YOU SHOULD LEARN
EARNlllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 1. Identify if the source is a primary or secondary source.
At the end of this lesson, you are expected
llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll a. Wikipedia
to:
llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll
identify the different sources of b. Interview
llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll
data, c. Administrative data
llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll
identify the different methods of
llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll
d. Journal
collecting data, e. Textbook
llll
identify the sampling techniques
2. Differentiate probability and non-probability sampling.
to be used.
DATA SOURCES
Variables were observed or measured using any of the three methods of data collection; objective,
subjective and use of existing records.
Primary – uses the method of objective and subjective. Obtained data directly from the source.
Secondary – data obtained through the use of existing records or data collected by other entities
for certain purposes.
Advantages Disadvantages
Primary You know how the data It will take a long time to collect
was collected. the data you need.
You can get exactly the It can be expensive.
data you need.
You know the accuracy
of the data.
Secondary It is a quick and cheap You may not know how the data
way to get a large was collected.
amount of data. You may not get the exact data you
need.
The data might be out of date.
You may not know how accurate
the data is.
METHODS OF COLLECTING DATA
INTERVIEW METHOD
o DIRECT – the researcher personally interviews the respondents.
o INDIRECT – the researcher use telephone, web cam or cellphone to interview.
TEST METHOD – this is widely used in psychological research and psychiatry. Standard tests
are used because of the validity, reliability, and usability.
REGISTRATION METHOD – the mechanical devices that can be used for social and
educational research in data gathering are the camera, projector, video tape, tape recorder,
etc.
SAMPLING TECHNIQUES
RANDOM OR PROBABILITY SAMPLING – one in which every member of the population has
an equal chance of being selected.
Simple Random Sampling - names of respondents are written on a small pieces of paper and rolled
then place in a jar and picked at random.
Stratified Sampling – it is used when it is important for the sample to have members from each
segment of the population.
Depending on the focus of the study, members of the population are divided into two or
more subsets, called strata, that share a similar characteristic such as age, gender,
ethnicity, or even political preference.
Cluster Sampling – clusters consist of geographic groupings and each cluster should contain
members with all of the characteristics. All of the members of one or more groups are used.
Systematic Sampling – a sample in which each member of the population is assigned a number.
The members of the population are ordered in some way, a starting number is randomly selected
and then sample members are selected at regular intervals from the starting number. (Ex. Every 3rd,
5th, or 100th member is selected)
Purposive – the respondents chosen based on their knowledge of the information required by the
researcher.
Example: Suppose a researcher wants to make a historical study about Town A. The target
population is the senior citizens of the town living in Town A since birth since they are the most
reliable persons to know the history of the town.
Convenience – this technique is resorted to by the researcher who need the information the fastest
way possible.
Example: A computer software store conducts a marketing study by interviewing potential
customers who happen to be in the store browsing through the available software.
Quota – is formed when the main consideration is to complete the designated proportional part of
the population.
Example: You are to investigate the relationship of students’ performance in Math and their attitude
towards the subject. However, you are only given limited time to do the study. You may only
consider 25 out of 500 students in your school.
KINDS OF GRAPHS/CHARTS
KINDS OF GRAPHS/CHARTS
1. BAR GRAPH – is a graph drawn using rectangular bars to show how large each value is. The
bars can either be horizontal or vertical. It is used to show how is one item related to another.
It is composed of x-axis (horizontal) and y-axis (vertical); where x-axis has the categories
being measured and y-axis has a scale for the numbers in each category.
For instance, the Facebook Page BusinessWorld recorded the unique individuals tested per day
on Corona Virus Disease (COVID-19) for the month of April and May 2020.
Using figure 1.0, which day has a highest number of unique individuals tested? Which day was
the lowest?
As shown in the graph, May 14 has the highest number of unique individuals tested with a total
of 10 841, while April 5 was the lowest with 344 individuals tested during that day.
Another example, using multiple bar graph to compare different data which are not opposite in
nature.
Figure 1.1
Using figure 1.1, Which has more savings on Monday? Wednesday? Friday?
2. PIE GRAPH/CHART – is a circle divided
into sectors proportional to the
frequencies. It shows how a part of
something relates to the whole. It is
important to define what the whole
represents. It is used when you are
showing the relative proportion or
percentage of numbers that add up to a
sum.
Another example,
6% 10%
20% 12%
6%
16%
30%
Figure 2.1
For instance, the graph at the left (Figure 3.0) shows the trend of number of daily
cases of COVID-19 in the Philippines. This graph can identify if the country is flattening its
curve in dealing with the disease. Another example is the analysis of University of the
Philippines regarding the post-Enhanced Community Quarantine (ECQ) measures relative to
healthcare capacity (see figure 3.1).
For example, Mr. Jolly is worried about the customer complaint regarding long queues in the
branch. He wants to analyze first what is the frequency of a major customer’s waiting time.
He has called out the cashier and asked him the details. Below is the waiting time of the
customer at the cash counter of the Jolly Me during peak hours which was observed by the
cashier. Let’s use histogram to show the data graphically.
CUSTOMER CUSTOMER
WAITING TIME WAITING TIME FREQUENCY
(IN MINUTES) (IN MINUTES)
2.30 2.30-2.86 3
5.00 2.86-3.43 1
3.55 3.43-3.99 2
2.50 3.99-4.56 3
5.10 4.56-5.12 4
4.21
3.33
4.10
2.55
5.07
3.45
4.10
5.12
TRY IT YOURSELF
Construct a histogram given the 100 ages of Grade 7 students of General Tiburcio de
Leon National High School.
AGES OF GRADE
7 STUDENTS FREQUENCY
11-12 13
12-13 28
13-14 22
14-15 18
15-16 9
16-17 5
17-18 3
18-19 2
5. PICTURE GRAPH/ PICTOGRAM – it is a visual presentation of statistical quantities by means
of drawing pictures or symbols related to the subject under study. See figure a.
6. MAP GRAPH/CARTOGRAM – it is one of the best ways to present geographical data. This
kind of graph is always accompanied by a legend which tells us the meaning of the lines,
colors or other symbols used and positioned in a map. See figure b.
7. SCATTER PLOT DIAGRAM – it is a graphical device to show the relationship between two
quantitative variables. See figure c.
Figure a. COVID-19 Medical Figure b. Class Suspension in Figure c. Ice Cream Sales vs.
Assistance (OCHA) Metro Manila (Earth Shaker) Noon Temperature
On the Recommended Charts tab, scroll through the list of charts that Excel recommends for your data, and
click any chart to see how your data will look.
If you don’t see a chart you like, click All Charts to see all the available chart types.
When you find the chart you like, click it > OK.
4
Use the Chart Elements, Chart Styles, and Chart Filters buttons, next to the upper-right corner of the chart
to add chart elements like axis titles or data labels, customize the look of your chart, or change the data that is
shown in the chart.
To access additional design and formatting features, click anywhere in the chart to add the CHART TOOLS to the
ribbon, and then click the options you want on the DESIGN and FORMAT tabs.
6
2. The inflation rates of the Philippines from 2008-2014. Use a line graph.
On the Recommended Charts tab, scroll through the list of charts that Excel recommends for your data, and
click any chart to see how your data will look.
If you don’t see a chart you like, click All Charts to see all the available chart types.
When you find the chart you like, click it > OK.
4
Use the Chart Elements, Chart Styles, and Chart Filters buttons, next to the upper-right corner of the chart
to add chart elements like axis titles or data labels, customize the look of your chart, or change the data that is
shown in the chart.
To access additional design and formatting features, click anywhere in the chart to add the CHART TOOLS to the
ribbon, and then click the options you want on the DESIGN and FORMAT tabs.
6
RANDOM VARIABLES AND PROBABILITY
UNIT DISTRIBUTION
2
1 Random Variables
This unit will discuss the concept of random variable and probability distribution. You will learn how to
construct the probability mass function of a discrete probability distribution and describe its properties
and characteristics by computing its mean and variance.
LessonV 1 Random Variables
At the end of this lesson, you are expected Experiment Sample Space
to: 1. Tossing three coins
illustrate a random variable,
distinguish between a discrete 2. Rolling a die
random variable and a 3. Getting a defective item when
continuous random variable; and two items are randomly
find the possible values of a selected from a box of two
random variable. defective and three non-
defective items.
RANDOM VARIABLES
A random variable is a function or rule that assigns a number to each outcome of an experiment.
It is denoted by an uppercase letter while its lowercase counterpart represents the value of the
random variable.
Discrete Random Variable – a random variable whose set of all possible values are countable.
Example: In tossing a coin, let X be the random variable representing the number of tails that occur.
X = 0, if it is head and X = 1, if it is tail.
Continuous Random Variable – a random variable whose set of all possible values are not countable
or infinite.
Example: An experiment is conducted to determine the distance that a certain type of car will travel
using 10 liters of gasoline over a prescribed test course. Let Y be the random variable representing
the distance, then Y ≥ 0.
TRY THIS!
Random or Not?
For each of the following, indicate whether it is or is not a random variable. Classify each random
variable as either discrete or continuous .
Example 2:
Suppose three cell phones are tested at random. Let D represent the defective cell phone and
let N represent the non-defective cell phone. Let Y be the random variable representing the
number of defective cell phones.
Steps Solution
1. Determine the sample space. Let D
represent the defective cell phone and let N
represent the non-defective cell phone. NNN, NND, NDN,DNN, DDN, DND,NDD, DDD
PROBABILITY DISTRIBUTION
A probability distribution is a function or rule that assigns the value of a random variable to the
probability associated with these values.
As we noted earlier, we use the uppercase letter to represent the random variable and lowercase
letter to represent the value of the random variable. Then, we represent the probability that the
random variable X will equal x as
Example 1:
Suppose three coins are tossed. Let Z be the random variable representing the number of heads
that occur. Find the probability values P(Z) to each value of the random variable.
Steps Solution
1. Determine the sample space. Let H
represent head and T represent tail. S = { HHH, THH, HTH, HHT, HTT, THT, TTH, TTT}
Table 1.1. The Probability Distribution or the Probability Mass Function of Discrete Random Variable Z
Z 0 1 2 3
1 3 3 1
P(Z)
8 8 8 8
Example 2:
In a recent census, the number of televisions per household was recorded
Number of televisions 0 1 2 3 4 5
Number of households 1 218 32 379 37 961 19 386 7 714 2 842
Solution:
ii. Assign the probability value P(X) to each value of the random variable. Reduce it to its
lowest term, if possible.
X P(X)
𝟑⁄
0 𝟐𝟓𝟎
𝟑𝟐 𝟑𝟕𝟗⁄
1 𝟏𝟎𝟏 𝟓𝟎𝟎
𝟏𝟖𝟕⁄
2 𝟓𝟎𝟎
𝟗 𝟔𝟗𝟑⁄
3 𝟓𝟎 𝟕𝟓𝟎
𝟏𝟗⁄
4 𝟐𝟓𝟎
𝟕⁄
5 𝟐𝟓𝟎
Table 1.2. The Probability Distribution or the Probability Mass Function of Discrete Random Variable X
X 0 1 2 3 4 5
Solution:
P(X ≤ 2), we are looking for the probability that the number of televisions per household is less than or
equal to 2. Those values of X are 0, 1 , and 2. Thus,
P(X > 2) , we are looking for the probability that the number of televisions per household is greater than
2. Those values of X are 3, 4, and 5. Thus,
Example 3:
An online seller advertises that he will deliver the products that a customer purchases in 3 to 6 days. The seller
wants to be precise in its advertising. Accordingly, she records the number of days it takes her to deliver the
goods to customers. From the data, the following probability distribution is developed.
Number of days 0 1 2 3 4 5 6 7 8
Probability 0 0 0.01 0.04 0.28 0.42 0.21 0.02 0.02
a. What is the probability that a delivery will be made within the advertised 3 to 6 day period?
b. What is the probability that a delivery will be late?
c. What is the probability that a delivery will be early?
Solution:
a. What is the probability that the delivery will be made within the 3 to 6 day period?
3. The probability of each value of the random variable must be between or equal to 0 and 1.
In symbol, we write it as 0 ≤ 𝑃(𝑋) ≤ 1.
4. The sum of the probabilities of all values of the random variable must be equal to 1.
In symbol, we write it as ∑ 𝑃(𝑋) = 1.
Lesson 3 Mean, Variance, and Standard Deviation of a Probability Distribution
Pre-assessment:
WHAT YOU SHOULD LEARN Complete the following frequency distribution table:
At the end of this lesson, you are expected X F ̅)
(𝑿 − 𝑿 ̅ )𝟐
(𝑿 − 𝑿 ̅ )𝟐
𝑭(𝑿 − 𝑿
to:
illustrate the mean, variance, and
standard deviation of a discrete 5 3
random variable,
calculate the mean or expected
8 5
value of a discrete probability
distribution; and
compute for the variance and 10 4
standard deviation of a discrete
probability distribution.
12 5
Find: 15 3
a. Mean n=20
b. Variance
c. Standard Deviation
Steps Solution
1. Construct the probability distribution
for the random variable. X P(x)
₱2 000 0.999
X P(x) XP(x)
₱2 000 0.999 1 998
₱1 900
INTERPRETATION: The insurance company’s expected gain from each individual who
avails of the policy is ₱1 900 each year.
VARIANCE AND STANDARD DEVIATION OF A DISCRETE RANDOM VARIABLE
The variance and standard deviation describe the amount of spread, dispersion, or variability of the
items in the distribution.
Formula for the Variance and Standard Deviation of a Discrete Probability Distribution
Example:
1. Determine the variance and standard deviation of the following probability mass function.
X 1 2 3 4 5 6
P(x) 0.15 0.25 0.30 0.15 0.10 0.05
𝜎 2 = 1.8475 𝑜𝑟 1.85
A normal distribution can have any mean and any positive standard deviation. These two
parameters are completely determine the shape of the normal curve. The mean gives the location of the
line of symmetry, and the standard deviation describes how much the data are spread out.
The total area under the normal distribution
curve is equal to 1.00 or 100%.
Empirical Rule
The area under the normal curve that lies within
one standard deviation of the mean is
approximately 0.68 (68%).
two standard deviations of the mean is
approximately 0.95 (95%).
three standard deviations of the mean is
approximately 0.997 ( 99.7%).
The standard normal distribution is a normal distribution with a mean of 0 and a standard
deviation of 1.
Since each normally distributed variable has its own mean and standard deviation, the shape and location
of these curves will vary. In practical applications, one would have to have a table of areas under the curve
for each variable. To simplify this, statisticians use the standard normal distribution.
Standard Normal Cumulative Probability Table
Cumulative probabilities for NEGATIVE z-values are shown in the following table:
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
-3.4 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0002
-3.3 0.0005 0.0005 0.0005 0.0004 0.0004 0.0004 0.0004 0.0004 0.0004 0.0003
-3.2 0.0007 0.0007 0.0006 0.0006 0.0006 0.0006 0.0006 0.0005 0.0005 0.0005
-3.1 0.0010 0.0009 0.0009 0.0009 0.0008 0.0008 0.0008 0.0008 0.0007 0.0007
-3.0 0.0013 0.0013 0.0013 0.0012 0.0012 0.0011 0.0011 0.0011 0.0010 0.0010
-2.9 0.0019 0.0018 0.0018 0.0017 0.0016 0.0016 0.0015 0.0015 0.0014 0.0014
-2.8 0.0026 0.0025 0.0024 0.0023 0.0023 0.0022 0.0021 0.0021 0.0020 0.0019
-2.7 0.0035 0.0034 0.0033 0.0032 0.0031 0.0030 0.0029 0.0028 0.0027 0.0026
-2.6 0.0047 0.0045 0.0044 0.0043 0.0041 0.0040 0.0039 0.0038 0.0037 0.0036
-2.5 0.0062 0.0060 0.0059 0.0057 0.0055 0.0054 0.0052 0.0051 0.0049 0.0048
-2.4 0.0082 0.0080 0.0078 0.0075 0.0073 0.0071 0.0069 0.0068 0.0066 0.0064
-2.3 0.0107 0.0104 0.0102 0.0099 0.0096 0.0094 0.0091 0.0089 0.0087 0.0084
-2.2 0.0139 0.0136 0.0132 0.0129 0.0125 0.0122 0.0119 0.0116 0.0113 0.0110
-2.1 0.0179 0.0174 0.0170 0.0166 0.0162 0.0158 0.0154 0.0150 0.0146 0.0143
-2.0 0.0228 0.0222 0.0217 0.0212 0.0207 0.0202 0.0197 0.0192 0.0188 0.0183
-1.9 0.0287 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.0233
-1.8 0.0359 0.0351 0.0344 0.0336 0.0329 0.0322 0.0314 0.0307 0.0301 0.0294
-1.7 0.0446 0.0436 0.0427 0.0418 0.0409 0.0401 0.0392 0.0384 0.0375 0.0367
-1.6 0.0548 0.0537 0.0526 0.0516 0.0505 0.0495 0.0485 0.0475 0.0465 0.0455
-1.5 0.0668 0.0655 0.0643 0.0630 0.0618 0.0606 0.0594 0.0582 0.0571 0.0559
-1.4 0.0808 0.0793 0.0778 0.0764 0.0749 0.0735 0.0721 0.0708 0.0694 0.0681
-1.3 0.0968 0.0951 0.0934 0.0918 0.0901 0.0885 0.0869 0.0853 0.0838 0.0823
-1.2 0.1151 0.1131 0.1112 0.1093 0.1075 0.1056 0.1038 0.1020 0.1003 0.0985
-1.1 0.1357 0.1335 0.1314 0.1292 0.1271 0.1251 0.1230 0.1210 0.1190 0.1170
-1.0 0.1587 0.1562 0.1539 0.1515 0.1492 0.1469 0.1446 0.1423 0.1401 0.1379
-0.9 0.1841 0.1814 0.1788 0.1762 0.1736 0.1711 0.1685 0.1660 0.1635 0.1611
-0.8 0.2119 0.2090 0.2061 0.2033 0.2005 0.1977 0.1949 0.1922 0.1894 0.1867
-0.7 0.2420 0.2389 0.2358 0.2327 0.2296 0.2266 0.2236 0.2206 0.2177 0.2148
-0.6 0.2743 0.2709 0.2676 0.2643 0.2611 0.2578 0.2546 0.2514 0.2483 0.2451
-0.5 0.3085 0.3050 0.3015 0.2981 0.2946 0.2912 0.2877 0.2843 0.2810 0.2776
-0.4 0.3446 0.3409 0.3372 0.3336 0.3300 0.3264 0.3228 0.3192 0.3156 0.3121
-0.3 0.3821 0.3783 0.3745 0.3707 0.3669 0.3632 0.3594 0.3557 0.3520 0.3483
-0.2 0.4207 0.4168 0.4129 0.4090 0.4052 0.4013 0.3974 0.3936 0.3897 0.3859
-0.1 0.4602 0.4562 0.4522 0.4483 0.4443 0.4404 0.4364 0.4325 0.4286 0.4247
0.0 0.5000 0.4960 0.4920 0.4880 0.4840 0.4801 0.4761 0.4721 0.4681 0.4641
EXAMPLE
2. Find the area that corresponds to z = -2.57 by finding -2.5 in the left column and then moving
across the row to the column under 0.07. The number in that row and column is 0.8749. So, the
area to the left of z = 1.13 is 0.0051.
3. Find the area that corresponds to z = 0.36 by finding 0.3 in the left column and then moving
across the row to the column under 0.06. The number in that row and column is 0.8749. So, the
area to the left of z = 1.13 is 0.6406.
Lesson 2 Regions of Area Under the Normal Curve
1.13
1.13
1.13 -1.47
- 11.13
2. .
Area Under the Standard Normal Distribution Curve
1. To the left of any z value:
Look up the z value in the table and use the area given.
78.55%
1.13
2. The area to the
left of z = 1.13 is
2. The area to the 0.8708.
left of z = -1.37
is 0.0853. 4. Subtract to find the area of the
region between the two z-scores:
4. If z < -1.47 z > 1.13 0.8708 – 0.0853 = 0.7855
Multiply it to 100 to solve for its
1. Use the table to find percentage.
the area for the z score.
.68% 12.92%
-2.47 1.13 3. The area to the left
2. The area to the
of z = 1.13 is 0.8708.
left of z = -2.47
1- 0.8907= 0.1292
is 0.0068.
13.60%
Alternative Solution 1. Use the table to find
the area for the z score.
13.60
86.40%
%
-2.47 1.13
2. The area to the
left of z = -2.47
is 0.0068. 3. The area to the left
of z = 1.13 is 0.8708.
Find the z value such that the area under the standard normal distribution curve
between 0 and the z value is 0.2123.
Add .5000 to .2123 to get the cumulative area of .7123. Then look for that value inside Table.
EXAMPLE
A survey by the National Retail Federation found that women spend on average $146.21 for the Christmas
holidays. Assume the standard deviation is $29.44. Find the percentage of women who spend less than
$160.00. Assume the variable is normally distributed.
Step 1: Draw the Normal Distribution Step 2: Find the z value corresponding to $160.00
Curve.
68.08%
The table gives us an area of .6808.
0 0.47 68% of women spend less than $160.
EXAMPLE
Each month, an American household generates an average of 28 pounds of newspaper for garbage or
recycling. Assume the standard deviation is 2 pounds. If a household is selected at random, find the
probability1.of its generating between 27 and 31 pounds per month. Assume the variable is approximately
normally distributed.
Step 1: Draw the Normal Distribution Step 2: Find the z value corresponding to $160.00
Curve.
𝑋− 27 − 28
𝒛= = = −𝟎. 𝟓
2
𝑋− 31 − 28
𝒛= = = 𝟏. 𝟓
2
62.47%
1. The area to the left of 2. The area to the left of
z = -0.5 is .3085 -0.5 0 1.5
z = 1.5 is .9332
3. Subtract # 1 from #2
EXAMPLE
.9332 - .3085 = .6247
The probability is 62%
The American Automobile Association reports that the average time it takes to respond to an emergency call
is 25 minutes. Assume the variable is approximately normally distributed and the standard deviation is 4.5
minutes. If 80 calls are randomly selected, approximately how many will be responded to in less than 15
minutes?
Step 1: Draw the Normal Distribution Step 2: Find the z value corresponding to $160.00
Curve.
𝑋− 15 − 25
𝒛= = = −𝟐. 𝟐𝟐
4.5
Step 3: Find the area to the left of z = -2.22.
To qualify for a police academy, candidates must score in the top 10% on a general abilities test. The test has a
mean of 200 and a standard deviation of 20. Find the lowest possible score to qualify. Assume the test scores
are normally distributed.
Specific
Value
.9000
PRE-ASSESSMENT:
At the end of this lesson, you are Ednel is working at TN Department Store in
expected to: Valenzuela City. The number of bags he was
illustrates random sampling, able to sell for three days are: 20, 30, and
identifies sampling 50. List all the possible samples of size 2
distribution of statistics that can be drawn from the population with
(sample mean), replacement.
find the mean and variance
of the sampling distribution A SAMPLING DISTRIBUTION is the probability
of the sample mean. distribution of a sample statistic that is formed when samples of
size n are taken from a population.
SAMPLING ERROR refers to the difference between the sample mean and the population.
EXAMPLE #1
A population consists of the numbers 2, 4, 9, 10, and 5. Let us list all possible sample size
of 3 from this population and compute the mean of each sample.
STEP
1 Determine the number of
sets of all possible random NCn=
𝑁! 5! 5!
= 3!(5−3)! = 3!2! = 10
𝑛!(𝑁−𝑛)!
samples that can be drawn
from a given population.
Sample ̅
𝒙
𝑁! 2, 4, 9 5
NCn=
𝑛!(𝑁−𝑛)! 2, 4, 10 5.33
2, 4, 5 3.67
List all the possible samples 2, 9, 10 7
and compute the mean of 2, 9, 5 5.33
each sample. 2, 10, 5 5.67
4, 9, 10 7.67
4, 9, 5 6
4, 10, 5 6.33
9, 10, 5 8
2 Construct the sampling
distribution of the means. ̅
𝒙 Frequency 𝑃(𝑥
̅)
3.67 1 1/10
5 1 1/10
5.33 2 2/10
5.67 1 1/10
6 1 1/10
6.33 1 1/10
7 1 1/10
7.67 1 1/10
8 1 1/10
EXAMPLE #2
Going back to the situation given in Pre-Assessment, let’s identify the samples given the number
of bags he was able to sell for three days which are 20, 30, and 50. (with replacement)
STEP
1 List all the possible samples
Observation Sample ̅
𝒙
and compute the mean of
1 20, 30 (20+30)/2 = 25
each sample.
2 20, 50 (20+50)/2 = 35
3 30, 50 (30+50)/2 = 40
4 30, 20 (30+20)/2 = 25
5 50, 20 (50+20)/2 = 35
6 50, 30 (50+30)/2 = 40
7 20, 20 (20+20)/2 = 20
8 30, 30 (30+30)/2 = 30
9 50, 50 (50+50)/2 = 50
2 Construct the sampling
̅
𝒙 Frequency 𝑃(𝑥
̅)
distribution of the means.
20 1 1/9
25 2 2/9
30 1 1/9
35 2 2/9
40 2 2/9
50 1 1/9
3 Illustrate using histogram
EXAMPLE #3
Nanno receives 92 or 93 as her grade on her three major subjects: Basic Calculus (BC), General
Chemistry (GC), and General Biology (GB). Construct the sampling distribution of her mean grade.
STEP
1 List all the possible samples
and compute the mean of BC GC GB ̅
𝒙
each sample. 92 92 92 92
92 92 93 92.33
92 93 92 92.33
92 93 93 93.67
93 92 92 92.33
93 92 93 92.67
93 93 92 92.67
93 93 93 93
2.1. What is the probability that her mean grade is lower than 93?
2.2. What is the probability that her mean grade is greater than 92.33?
Therefore, the probability that her mean grade greater than 92.33 is 50%.
PROPERTIES OF SAMPLING DISTRIBUTION OF SAMPLE MEANS
To solve for mean of the sample means 𝜇𝑥̅ : 𝜇𝑥̅ = ∑[𝑥̅ ∙ 𝑃(𝑥̅ )]
2. The variance of the sampling distribution of the sample means 𝜎 2𝑥̅ is given by
𝜎 2𝑥̅ = ∑[𝑃(𝑥 ̅ − 𝜇)2 ]
̅) ∙ (𝑥
or
̅)2 𝑃(𝑥
𝜎 2𝑥̅ = ∑[(𝑥 ̅)] − 𝜇2
𝜎 2 𝑁−𝑛
𝜎 2𝑥̅ = ∙
𝑛 𝑁−1
for finite population (without replacement)
3. The standard deviation of the sampling distribution of the sample mean is given by:
𝜎2
𝜎 𝑥̅ = √ 𝑛 for infinite population (with replacement)
𝜎 2 𝑁−𝑛
𝜎 𝑥̅ = √ ∙ for finite population (without replacement)
𝑛 𝑁−1
𝜎
𝜎𝑥̅ = The standard deviation of the sampling distribution of the sample
√𝑛
. mean is called the STANDARD ERROR of the mean.
EXAMPLE #1.1
2
∑(𝑥 − 𝜇)2 (2 − 6)2 + (4 − 6)2 + (9 − 6)2 + (10 − 5)2 + (5 − 5)2
𝜎 = = = 9.2
𝑛 5
c. Compute the mean of the sample means 𝑥̅ .
𝑥̅ 𝑃(𝑥
̅) 𝑥̅ ∙ 𝑃(𝑥
̅)
3.67 1/10 0.367
5 1/10 0.5
5.33 2/10 1.066
5.67 1/10 0.567
6 1/10 0.6
6.33 1/10 0.633
7 1/10 0.7
7.67 1/10 0.767
8 1/10 0.8
̅)2 𝑃(𝑥
𝜎 2𝑥̅ = ∑[(𝑥 ̅)] − 𝜇2 = 37.53334 − (6)2
𝜎 2𝑥̅ = 1.53334 𝑜𝑟 𝟏. 𝟓𝟑
You can use the alternative method. Using the population variance,
𝜎2 𝑁 − 𝑛
𝜎 2𝑥̅ = ∙
𝑛 𝑁−1
9.2 5 − 3
𝜎 2𝑥̅ = ∙ = 𝟏. 𝟓𝟑
3 5−1
EXAMPLE #4
𝜎2
𝜎 2𝑥̅ =
𝑛
𝜎 2 = (𝜎 2𝑥̅ )(𝑛)
𝜎 2 = (2.5)(4)
𝜎 2 = 10
EXAMPLE #5
Suppose a random sample of size 200 is taken from a population with a mean of 510
kg and standard deviation of 15kg.
a. Find the mean and the variance of the sample mean.
b. If it is required to reduce the standard error of the mean to less than 0.5 kg,
what is the minimum sample size.
𝜎2
𝜎 2𝑥̅ =
𝑛
152
𝜎 2𝑥̅ =
200
𝜎 2𝑥̅ = 1.13
15
√𝑛 >
0.5
√𝑛 > 30
𝑛 > 900
PRE-ASSESSMENT:
At the end of this lesson, you are Try this before you proceed to the next part of the lesson.
expected to:
illustrate the Central Limit Given a die, it has 6 faces in which each
Theorem, face has either dot/s of 𝑥 = 1, 2, 3, 4, 5, 6.
defines the sampling Given it as the population, consider the
distribution of mean using following sample size:
the Central Limit Theorem, 𝑛=1
and 𝑛=2
solve problems involving 𝑛=3
sampling distribution of
mean. Illustrate the probability histogram of the
sampling distribution of the mean.
For 𝑛 = 1
̅
𝒙 Frequency 𝑥)
𝑃(̅
1 1 1/6
2 1 1/6
3 1 1/6
4 1 1/6
5 1 1/6
6 1 1/6
For 𝑛 = 2
̅
𝒙 Frequency 𝑃(𝑥
̅)
1 1 1/36
1.5 2 2/36
2 3 3/36
2.5 4 4/36
3 5 5/36
3.5 6 6/36
4 5 5/36
4.5 4 4/36
5 3 3/36
5.5 2 2/36
6 1 1/36
This theorem describes the relationship between the sampling distribution of sample means and the
population that the samples are taken from.
1. The samples of size n, where 𝑛 ≥ 30, are drawn from any population with a mean 𝜇 and a standard
deviation 𝜎, then the sampling distribution of sample means approximates a normal distribution.
The greater the sample size, the better the approximation.
2. If the population itself is normally distributed, then the sampling distribution of sample means is
normally distributed for any sample size n.
𝑥−𝜇
𝑧= 𝜎
√𝑛
or
𝑥−𝜇𝑥̅
𝑧=
𝜎𝑥̅
EXAMPLE #1
The population mean monthly salary for Associate Professor is about ₱ 63 500. A
random sample of 35 Associate Professor is drawn from the population. What is the
probability that the mean salary of the sample is less than ₱ 60 000? Assume that
𝝈 = ₱ 𝟔 𝟏𝟎𝟎.
Thus, the probability that mean monthly salary of an Associate Professor is less
than ₱ 60 000 is 0.03%.
EXAMPLE #2
Out of 150 teenager drivers, you randomly picked 50 drivers. What is the probability
that is mean time they spend driving each day is between 24.7 and 25.5 minutes?
Assume that 𝝈 = 𝟏. 𝟓 𝒎𝒊𝒏𝒖𝒕𝒆𝒔 and 𝝁 = 𝟐𝟓minutes.
If 𝑥 = 24.7, If 𝑥 = 25.5,
24.7 − 25 25.5 − 25
𝑧= 𝑧=
1.5 1.5
√50 √50
𝑧 = −1.41 𝑧 = 2.36
Finding the probability to the left of 𝑧,
𝑃(𝑧 < −1.41) = 0.0793,
𝑃(𝑧 < 2.36) = 0.9909
Thus, the probability that drivers have a mean of driving time between 24.7
minutes and 25.5 minutes is 91.16%.
EXAMPLE #3
The mean NAT scores of Grade 10 students is 65. Sixty students were chosen and found
that the standard deviation of their scores is 5. What is the probability that their mean
score is between 64 and 67?
If 𝑥 = 64, If 𝑥 = 67,
64 − 65 67 − 65
𝑧= 𝑧=
5 5
√60 √60
𝑧 = −1.55 𝑧 = 3.10
Therefore, the probability that the mean score is between 64 and 67 is 0.9384 or
93.84%.
EXAMPLE #4
Suppose the mean amount of cholesterol in eggs labeled “large” is 186 milligrams, with
standard deviation 7 milligrams. Find the probability that the mean amount of cholesterol
in a sample of 144 eggs will be within 2 milligrams of the population mean.
Therefore, the probability that the mean amount of cholesterol in a sample of 144
eggs will be within 2 milligrams of the population mean is 0.9994 or 99.94%.
UNIT CONFIDENCE
INTERVALS
5
Confidence Intervals for the Mean You wish to find the leading candidate for presidency
1
(Large Samples) in the next election. Since its impossible for you to ask all
the registered voters on who will they vote, you conducted a
2 Confidence Intervals for the Mean survey to 5000 registered voters. You found out that 33% of
(Small Samples), t-distribution
them wanted Rodrigo Duterte to become the next president.
Since the estimated percentage is just a single number, it is
3 Confidence Intervals for
Population Proportion hard to tell that it is the true proportion of results. To estimate
the result, you need to use margin of error to have a range
where the true proportion lie. In this case, you have 1%
margin of error which means statistically 32-34% wanted to
vote for Duterte. In this unit, you will learn how to estimate
the parameter given a situation.
Lesson 1 CONFIDENCE INTERVALS FOR THE MEAN (LARGE SAMPLES)
PRE-ASSESSMENT:
At the end of this lesson, you are Below is the frequency distribution table
expected to: of random sample of the weight (in kg) of
illustrate point and Grade 11 students in Pamantasan ng
interval estimations, Lungsod ng Valenzuela, find the mean.
distinguishes between
WEIGHT (in kg) FREQUENCY
point and interval
43-47 6
estimation,
48-52 10
computes for the point
53-57 7
estimate of the
58-62 4
population mean.
63-67 1
68-72 2
In this lesson, you will learn how to use sample statistics to make an estimate of the population
parameter when the sample size is at least 30 or when the population is normally distributed and the
standard deviation is known. To make such an inference, begin by finding a point estimate.
A point estimate is a single value estimate for a population parameter. The most unbiased point
estimate of the population mean is the sample mean 𝑥̅ .
An interval estimate is an interval, or range of values, used to estimate a population parameter.
Given a level of confidence c, the margin of error E (sometimes also called the maximum error
of estimate or error tolerance) is the greatest possible distance between the point estimate and the
value of the parameter it is estimating.
𝜎
𝐸 = 𝑧𝑐
√𝑛
CONFIDENCE INTERVALS FOR THE POPULATION MEAN
Using a point estimate and a margin of error, you can construct an interval estimate of a population
parameter such as This interval estimate is called a confidence interval.
STEPS
Find the sample statistics n and 𝑥̅ .
1
Specify 𝜎 if known. Otherwise, if 𝑛 ≥ 30, find the sample
2 standard deviation s and use it as an estimate for 𝜎.
Given a c-confidence level and a margin of error E, the minimum sample size n needed to estimate
the population mean 𝜇 is
𝑧𝑐 𝜎 2
𝑛=( )
𝐸
If is 𝜎 unknown, you can estimate it using s, provided you have a preliminary sample with at least
30 members.
Let’s go back to the situation given in the pre-assessment. Solving for the mean of the given data,
WEIGHT (in kg) FREQUENCY MIDPOINT (𝒙) 𝒇𝒙
43-47 6 45 270
48-52 10 50 500
53-57 7 55 385
58-62 4 60 240
63-67 1 65 65
68-72 2 70 140
∑ 𝑓𝑥 1600
𝑥̅ = = = 53.33 𝑘𝑔
𝑛 30
To identify the interval of the population parameter of the given data, the sample mean of
53.33 𝑘𝑔 will be the point estimate. Now, given 95% confidence level, find the margin of error for
the mean weight of the Grade 11 students of Pamantasan ng Lungsod ng Valenzuela. Assuming
that the standard deviation is about 7kg.
𝜎
𝐸 = 𝑧𝑐
√𝑛
𝑧𝑐 = 1.96, 𝜎 = 7, 𝑛 = 30
7
𝐸 = (1.96) ( ) = 2.50
√30
Thus, given the 95% confidence level, the margin of error for the population mean is 5.37kg.
𝑥̅ − 𝐸 < 𝝁 < ̅𝑥 + 𝐸
53.33 − 2.50 < 𝝁 < 53.33 − 2.50
50.83 < 𝝁 < 55.83
From a random sample of 60 days of the year 2020, Philippine gasoline prices had a mean
of ₱ 60.25 and a standard deviation of ₱21.75. Construct the 90%, 95%, and 99% confidence
interval for the population mean.
𝑥̅ − 𝐸 < 𝝁 < ̅𝑥 + 𝐸
With 90% confidence, the population
60.25 − 4.62 < 𝝁 < 60.25 − 4.62 mean price of the gasoline in the Philippines year
𝟓𝟓. 𝟔𝟑 < 𝝁 < 𝟔𝟒. 𝟖𝟕 2020 is between ₱55.63 and ₱64.87
𝑥̅ − 𝐸 < 𝝁 < ̅𝑥 + 𝐸
With 95% confidence, the population
60.25 − 5.50 < 𝝁 < 60.25 + 5.50 mean price of the gasoline in the Philippines year
𝟓𝟒. 𝟕𝟓 < 𝝁 < 𝟔𝟓. 𝟕𝟓 2020 is between ₱54.75 and ₱65.75.
EXAMPLE #3
𝑅𝐸−𝐿𝐸
a. 𝐸= 2
36.92 − 35.08
𝐸=
2
𝐸 = 𝟎. 𝟗𝟐
𝐿 = 𝑅𝐸 − 𝐿𝐸 = 36.92 − 35.08 = 𝟏. 𝟖𝟒
Hence, the margin of error is 0.92 and the length of confidence interval is 1.84.
𝜎
b. 𝐸 = 𝑧𝑐
√𝑛
0.60
𝐸 = (1.96) ( ) = 𝟎. 𝟏𝟖
44
𝐿 = 2𝐸 = 2(0.18) = 𝟎. 𝟑𝟔
Thus, the margin of error is 0.18 and the length of confidence interval is 0.36.
EXAMPLE #4
Given E = 75 and σ=250, find the minimum sample size if the confidence Level is:
(a) 90%, (b) 95%, and (c) 99%
𝑧𝑐 𝜎 2
With 90% confidence level, 𝑛 = ( )
𝐸
1.645 ∙ 250 2
𝑛=( ) = 30.07
75
The minimum sample size is 31.
𝑧𝑐 𝜎 2
With 95% confidence level, 𝑛 = ( )
𝐸
1.96 ∙ 250 2
𝑛=( ) = 42.68
75
The minimum sample size is 43.
𝑧𝑐 𝜎 2
With 99% confidence level, 𝑛 = ( )
𝐸
2.58 ∙ 250 2
𝑛=( ) = 73.96
75
The minimum sample size is 74.
EXAMPLE #5
A company president wishes to estimate the average number of hours his part-
time employee per week. The standard deviation from a previous study is 9.3 hours.
How large a sample must be selected if he wants to be 99% confidence of finding
whether the true mean differs from the sample mean by 4 hours?
𝑧𝑐 𝜎 2
With 99% confidence level, 𝑛 = ( )
𝐸
2.58 ∙ 9.3 2
𝑛=( ) = 35.98
4
A researcher found that the IQ scores of the ALS students in the Division of
Valenzuela are normally distributed with a mean of 110 and a standard deviation of 10.
How many ALS students are needed to test so that the estimate will not be more than 5
from the population mean with a 99% level of confidence?
𝑧𝑐 𝜎 2
With 99% confidence level, 𝑛 = ( )
𝐸
2.58 ∙ 10 2
𝑛=( ) = 26.63
5
Therefore, 27 ALS students are needed to test so that the estimate will not be more
than 5 from the population mean with a 99% level of confidence.
NOTES
Increasing the confidence level will also increase the margin of error that gives a
wider interval of the population mean.
As the level of confidence increases, the confidence interval widens. As confidence interval
widens, the precision of the estimate decreases. To prevent the decrease of precision, the
sample size should also increase.
For minimum sample size, round UP the result to obtain whole number.
There are three (3) factors that influence sample size determination: (1) level of
confidence, (2) population standard deviation, and (3) the margin of error.
Researchers can control margin of error and confidence level. The less error you are
willing to accept, the bigger the sample size needs to be. Also, the more confident you
want to be, the bigger the sample size needs to be.
UNIT CONFIDENCE
INTERVALS
5
Confidence Intervals for the Mean You wish to find the leading candidate for presidency
1
(Large Samples) in the next election. Since its impossible for you to ask all
the registered voters on who will they vote, you conducted a
2 Confidence Intervals for the Mean survey to 5000 registered voters. You found out that 33% of
(Small Samples), t-distribution
them wanted Rodrigo Duterte to become the next president.
Since the estimated percentage is just a single number, it is
3 Confidence Intervals for
Population Proportion hard to tell that it is the true proportion of results. To estimate
the result, you need to use margin of error to have a range
where the true proportion lie. In this case, you have 1%
margin of error which means statistically 32-34% wanted to
vote for Duterte. In this unit, you will learn how to estimate
the parameter given a situation.
Lesson 2 CONFIDENCE INTERVALS FOR THE MEAN (SMALL SAMPLES)
PRE-ASSESSMENT:
At the end of this lesson, you are Given that the sample mean is 150.5, 𝜎 = 30.25 and n = 50,
expected to: find the confidence interval if the confidence level is:
illustrates the t-distribution, (a) 90%,
identifies regions under the t-
(b) 95%, and
distribution corresponding to
t-values, (c) 99%.
computes for the confidence
interval estimate based on In many real-life situations, the population standard
the appropriate form of the deviation is unknown. Moreover, because of various
estimator for the population constraints such as time and cost, it is often not practical to
mean, and collect samples of size 30 or more. So, how can you construct
solve problems involving a confidence interval for a population mean given such
confidence interval circumstances? If the random variable is normally
estimation of the population distributed (or approximately normally distributed), you
mean.
can use a t-distribution.
t-DISTRIBUTION
ILLUSTRATION OF Critical values of t are denoted by several properties of the t-distribution are as
DEGREES OF follows.
FREEDOM 1. The t-distribution is bell-shaped and symmetric about the mean.
2. The t-distribution is a family of curves, each determined by a parameter called the
Suppose the number of degrees of freedom. The degrees of freedom are the number of free choices left after
chairs in your
a sample statistic such as is calculated. When you use a t-distribution to estimate a
classroom equals to
number of students: 20 population mean, the degrees of freedom are equal to one less than the sample size.
chairs for 20 students. Degrees of freedom
Each of the first 19 d.f. = n-1
students has a choice
to which chair he or she 3. The total area under a t-curve is 1 or 100%.
will sit in. There is no
4. The mean, median, and mode of the t-distribution are equal to 0.
freedom of choice,
however, for the 20th 5. As the degrees of freedom increase, the t-distribution approaches the normal
student who enters the distribution. After 30 d.f. the t-distribution is very close to the standard normal z-
room. distribution.
t-table
EXAMPLE #1
Find the critical value 𝒕𝒄 for a 90% confidence level when the sample size is 14.
𝑛 = 14
𝑑𝑓 = 𝑛 − 1 = 14 − 1 = 13
𝑐 = 90%
𝑡𝑐 = ±1.771
EXAMPLE #2
Find the critical value 𝒕𝒄 for a 95% confidence level when the sample size is 20.
𝑛 = 20
𝑑𝑓 = 𝑛 − 1 = 20 − 1 = 19
𝑐 = 95%
𝑡𝑐 = ±2.093
CONFIDENCE INTERVALS AND t-DISTRIBUTIONS
Constructing a confidence interval using the t-distribution is similar to constructing a confidence
interval using the normal distribution—both use a point estimate and a margin of error E.
𝑠
𝐸 = 𝑡𝑐
√𝑛
4. Find the left and right endpoints and form the confidence interval.
Find the margin of error if 𝒔 = 𝟓, 𝒏 = 𝟏𝟔 and the confidence interval is: 90%, (b)95%,
(c)99%.
𝑠 = 5, 𝑛 = 16; 𝑑𝑓 = 𝑛 − 1 = 16 − 1 = 15, 𝑐 = 90%, 𝑡𝑐 = 1.753
𝑠
𝐸 = 𝑡𝑐
√𝑛
5
𝐸 = 1.753 ( ) = 𝟐. 𝟏𝟗
√16
𝑠
𝐸 = 𝑡𝑐
√𝑛
5
𝐸 = 2.131 ( ) = 𝟐. 𝟔𝟔
√16
𝑠
𝐸 = 𝑡𝑐
√𝑛
5
𝐸 = 2.947 ( ) = 𝟑. 𝟔𝟖
√16
EXAMPLE #4
(𝒙
̅ − 𝑬) < 𝝁 < ( 𝒙
̅ + 𝑬)
EXAMPLE #5
You randomly select 16 coffee shops and measure the temperature of the coffee sold
at each. The sample mean temperature is 𝟏𝟔𝟐. 𝟎℉ with a sample standard deviation of
𝟏𝟎. 𝟎℉. Construct a 95% confidence interval for the population mean temperature.
Assume the temperatures are approximately normally distributed.
𝑠
𝐸 = 𝑡𝑐
√𝑛
10
𝐸 = 2.131 ( ) = 𝟓. 𝟑𝟑
√16
With 95% confidence, the population mean temperature of coffee sold in coffee shops is
between 𝟏𝟓𝟔. 𝟔𝟕℉ and 𝟏𝟔𝟕. 𝟑𝟑℉.
UNIT CONFIDENCE
INTERVALS
5
Confidence Intervals for the Mean You wish to find the leading candidate for presidency
1
(Large Samples) in the next election. Since its impossible for you to ask all
the registered voters on who will they vote, you conducted a
2 Confidence Intervals for the Mean survey to 5000 registered voters. You found out that 33% of
(Small Samples), t-distribution
them wanted Rodrigo Duterte to become the next president.
Since the estimated percentage is just a single number, it is
3 Confidence Intervals for
Population Proportion hard to tell that it is the true proportion of results. To estimate
the result, you need to use margin of error to have a range
where the true proportion lie. In this case, you have 1%
margin of error which means statistically 32-34% wanted to
vote for Duterte. In this unit, you will learn how to estimate
the parameter given a situation.
Lesson 3 CONFIDENCE INTERVALS FOR POPULATION PROPORTIONS
where x is the number of successes in the sample and n is the sample size. The point estimate for the
population proportion of failures is 𝑞̂ = 1 − 𝑝̂ . The symbols 𝑝̂ and 𝑞̂ are read as “p hat” and “q hat.”
where: p is the population proportion, E is the margin of error, p ̂-E is lower confidence limit and
p ̂+E is the upper confidence limit.
𝑝̂ (1 − 𝑝̂ )
𝐸 = (𝑧𝑐 )√
𝑛
In a survey of 1000 adults, 373 said that it is acceptable to legalized divorce in the country.
a. Find a point estimate for the population proportion of adults who say it is acceptable
to legalized divorce in the country.
b. Construct a 95% confidence interval for the population proportion of adults who say
that it is acceptable to legalized divorce in the country.
𝑥
𝑝̂ =
𝑛
373
𝑝̂ = = 𝟎. 𝟑𝟕𝟑
1000
𝑞̂ = 1 − 𝑝̂
𝑞̂ = 1 − 0.37 = 0.627
𝑝̂(1−𝑝̂)
𝐸 = (𝑧𝑐 )√ 𝑛
; 𝑧𝑐 = 1.96
0.373(0.627)
𝐸 = (1.96)√ = 0.030
1000
𝑝̂ − 𝐸 < 𝑝 < 𝑝̂ + 𝐸
0.373 − 0.03 < 𝑝 < 0.373 + 0.03
𝟎. 𝟑𝟒𝟑 < 𝒑 < 𝟎. 𝟒𝟎𝟑
Hence, with 95% confidence, the population proportion of Filipinos who say that it is
acceptable to legalized divorce in the country is between 34.3% and 40.3%.
EXAMPLE #2
In a survey of 2000 Filipinos (aged 16-25), 1231 said that BlackPink is the best KPOP girl
group in Asia.
a. Find a point estimate for the population proportion of Filipinos who say BlackPink is
the best KPOP girl group in Asia.
b. Construct a 90% confidence interval for the population proportion of Filipinos who
say BlackPink is the best KPOP girl group in Asia.
𝑥
𝑝̂ =
𝑛
1231
𝑝̂ = = 𝟎. 𝟔𝟏𝟓𝟓
2000
𝑞̂ = 1 − 𝑝̂
𝑞̂ = 1 − 0.6155 = 0.3845
𝑝̂(1−𝑝̂)
𝐸 = (𝑧𝑐 )√ ; 𝑧𝑐 = 1.645
𝑛
0.6155(0.3845)
𝐸 = (1.645)√ = 0.0179
2000
𝑝̂ − 𝐸 < 𝑝 < 𝑝̂ + 𝐸
0.6155 − 0.0179 < 𝑝 < 0.6155 + 0.0179
𝟎. 𝟓𝟗𝟕𝟔 < 𝒑 < 𝟎. 𝟔𝟑𝟑𝟒
Hence, with 90% confidence, the population proportion of Filipinos who say BlackPink is
the best KPOP girl group in Asia is between 59.76% and 63.34%.
𝑧𝑐 2
𝑛 = 𝑝̂ 𝑞̂ ( )
𝐸
This formula assumes that you have preliminary estimates of 𝑝̂ and 𝑞̂. If not, use 𝑝̂ and 𝑞̂ = 0.5.
EXAMPLE #3
Miriam is running for President and wish to estimate, with 95% confidence, the
population proportion of registered voters who will vote her. Her estimate must be accurate
within 3% of the population proportion. Find the minimum sample size needed if (a) no
preliminary estimate is available and (b) a preliminary estimate gives 𝒑
̂ = 𝟎. 𝟑𝟏.
The minimum sample size for no preliminary estimate is 1068 registered voters.