0% found this document useful (0 votes)
113 views

Statistics and Probability Module

Uploaded by

jhoyjhoyorantes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
113 views

Statistics and Probability Module

Uploaded by

jhoyjhoyorantes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Immaculada Concepcion College

Of Soldier’s Hills Caloocan City, Inc.


Soldier’s Hills III Subd. Brgy. 180, Tala, North Caloocan City

Name: _________________________________________ Strand and Section: _________________________________

Teacher: _______________________________________ Period cover: ____________________________

Week # ____1___

LESSON 1: RANDOM VARIABLES

I. INTRODUCTION
Basketball players are being accepted or draft pick based on their performance in the game, and other relevant
characteristics. If you are a manager of a basketball team, how would you answer the following inquiries?
1. What information should be obtained to select the player your team needs?
2. How do you count or measure the information needed for making decisions?

II. LESSON OBJECTIVES


a. Learn a broad view of random variable.
b. Appreciate the significance of random variable.
c. Apply the concepts of random variable in any field of interest.

III. PRE- ASSESSMENT


Classify each random variable as discrete of continuous.
__________1. Number of women among 10 newly hired teachers
__________2. Height (in inches) of a randomly selected adult male
__________3. Number of car accidents among 8 selected cities.
__________4. Amount of rainfall (in mm) in the different cities in Metro Manila
__________5. Number of gifts received by 20 students during Christmas season.
__________6. Weight (in grams) of 8 randomly selected Math books.
__________7. Cost (rounded to the nearest Php) of a Statistic book
__________8. Number of eggs a hen lay.
__________9. The amount of milk obtained from a cow
__________10. Average temperature (in ℃) in Baguio City for the past 5 days.

IV. LESSON CONTENT

To answer the question posted, we need to know certain basic concepts. Elements is the source of relevant
information or data, i.e an individual, entity, population unit.
Variable is a variable being measured to produce numerical observations associated with the random
outcomes of a chance experiment.
Random Variable is a variable being measured to produce numerical observations associated with the random
outcomes of a chance experiment.
Observations are numerical values associated with measuring the variable.

There are two types of random variables:


1. Discrete Random Variables are random variables where the observed numerical value are produced
by counting and assumes whole numbers only.
2. Continuous Random Variables are random variables where the observed numerical values are results of
measuring and may take on any numbers contained within any numerical intervals.

To answer the focus question, let us apply and illustrate the concepts to this table.
Elements Random Variables

Playing
Players Points per No. of No. of time per Field goal Height Weight
game rebounds assists per game (%) (in m) (in lbs)
per game game (in mins)
A 5 2 4 4.65 85 1.83 165
B 10 3 4 5.9 80 1.88 175
C 18 5 6 6.7 75.7 1.96 195
D 22 7 8 8 68.4 2.06 210
E 20 4 10 7.5 50 1.93 205
F 11 3 15 6 45.3 1.83 160
G 4 2 18 5 38.9 1.80 158.2

Discrete Random Variables Continuous Random Variable


ALL RIGHTS RESERVED: IMMACULADA CONCEPCION COLLEGE (STATISTICS AND PROBABILITY) PAGE 1 OF 48
Immaculada Concepcion College
Of Soldier’s Hills Caloocan City, Inc.
Soldier’s Hills III Subd. Brgy. 180, Tala, North Caloocan City

In this table “Players” is considered an element, while “Points per game” “number of rebounds per game”,
etc. are the random variables. Of these, the first three are discrete random variables, while the last four are the
continuous random variables.

The variable “Points per game” is considered a discrete random variable since the observation is any whole
number from 4 to 22. Since playing time per game is measured in minutes, the observation took on any value
from 4.65 minutes to 8 minutes; this is a continuous random variable.

To summarize, consider the following:

Discrete Random Variables Continuous Random Variable


Observation for x is a whole number Observation for x is any point within the range
Points per game 4 < x < 22 Playing time per game 4.65 < x < 8
Rebounds per game 2<x<7 Field goal % 38.9 < x < 85
Assists per game 4 < x < 18 Height (in meters) 1.80 < x < 2.06
Weight (in lbs) 158.2 < x < 210

V. PRACTICE

Let us consider another example. In each of the experiments to be performed, determined the possible
observations that can be made, and classify the variable according to type.

Experiment Element Random Variables (x)


Rolling two dice Two dice colored red and blue Sum of numbers from the two dice
Free falling Objects Objects with different weights Acceleration (meters/second)

In rolling a die, there are six possible observations corresponding to numbers 1 to 6. Since the random variable is
the sum of the numbers from the two dice, the possible observations would be the combined number in the
two dice, such as 1 and 1, 1 and 2 or 6 and 6. Therefore, it should be any whole number from 2 to 12 written
mathematically as 2 < x < 12. The variable is then considered discrete.

For free falling objects, acceleration which is distance traveled by the object over the square of time expressed
in second, may be any value greater than or equal to 0, written mathematically as x > 0 and thus, is considered
continuous.

VI. ENRICHMENT

Most of the improvements the world is enjoying today are attributed to the understanding of the nature of
information or data that occurred in any experimental event. These are the characteristics of certain variable
randomly observed as an outcome of an experiment of chance.
Let us apply and illustrate the concepts we have learned in the table below.

Fill in the blanks to complete the information table.


Experiment Element Random Variable Observation Type of variable
(x) (numerical values)
Customer arrival Restaurants with 10
seating capacity
Roads from farm to Percentage
market completion
Student enrollment Schools 0 if female
1 if male
Business companies Gross Sales (in millions of Pesos)
Observing new Birth weights (In lbs.)
born babies

ALL RIGHTS RESERVED: IMMACULADA CONCEPCION COLLEGE (STATISTICS AND PROBABILITY) PAGE 2 OF 48
Immaculada Concepcion College
Of Soldier’s Hills Caloocan City, Inc.
Soldier’s Hills III Subd. Brgy. 180, Tala, North Caloocan City

VII. EVALUATION

A. Classify the following according to type of variable.


____________1. The number of goods sold in retail store.
____________2. Volume of gasoline consumed by an automatic car
____________3. Names listed in a voting center.
____________4. Outcomes when tossing a coin
____________5. Temperature observed in Kelvin units.
____________6. Annual gross sales in a supermarket.
____________7. Actual number of metric tons loaded in a container van
____________8. The angle of elevation projected by missile launchers
____________9. Type of blood extracted from patients.
____________10. The interest rate on Return of Investments (ROI)
____________11. pH of an unknown liquid
____________12, Color of hair.
____________13. Diastolic blood pressure
____________14. Weight of an overactive pituitary gland
____________15. Thickness of a book
____________16. Red blood cell count
____________17. Volume of water in the lungs of an infected patient
____________18. Presence of absence of a certain disease
____________19. Color of the conjunction of the eyes
____________20. Length of incubation for an ostrich egg

B. Give an example of a discrete and a continuous variable that would be an interest to the following:
Discrete Variable Continuous
1. Biologist _____________________________ _____________________________
2. Accountant _____________________________ _____________________________
3. Economist _____________________________ _____________________________
4. Engineer _____________________________ _____________________________
5. Chef _____________________________ _____________________________
6. Computer game developer _____________________________ _____________________________

C. Scholars of physical science devote much of their time in performing experiments. They are interested in
verifying theories on areas such as physics, astronomy, geology, and chemistry based on the data
resulting from experiment. The following variables have been gathered through various conditions.
Which are discrete and which are continuous variable.

____________1. The time traveled by projectile motion.


____________2. The components of vectors.
____________3. The force of gravity of an object.
____________4. The quantity of satellite orbiting around the earth
____________5. The quantity of matter in an object.
____________6. Magnitude of atoms in a certain molecule
____________7. The number of meteorites hitting a satellite per day
____________8. The speed of light from the earth to the moon
____________9. The temperature of water.
____________10. The distance traveled by a moving car.

ALL RIGHTS RESERVED: IMMACULADA CONCEPCION COLLEGE (STATISTICS AND PROBABILITY) PAGE 3 OF 48
Immaculada Concepcion College
Of Soldier’s Hills Caloocan City, Inc.
Soldier’s Hills III Subd. Brgy. 180, Tala, North Caloocan City

Week # _____2____

LESSON 2: Probability Distribution of Discrete Random Variable

I. INTRODUCTION
Discoveries of patterns regarding the likelihood of its occurrences (probability distribution) paved way to
forecasting and estimating significant results of related variables. This lesson set forth to bring about
understanding the concepts and exploring the applications.

II. LESSON OBJECTIVES


a. Learn broad view of probability distribution.
b. Appreciate the significance of probability distribution.
c. Apply the concepts of random variable and its probability distribution in any field of interest.

III. PRE- ASSESSMENT


A laboratory supervision in Type III hospital is investigating number of reported on-the-job training accidents
related to needle stick injuries over a period of one month. Based on the past records, she has the following
records on the needle stick injuries that were recorded:
Needle stick injuries reported Frequency (f) P(x)
0 300
1 10
2 1
3 5
4 2
5 1

1. Complete the table.


2. What is the probability distribution of the discrete random variable being considered?

IV. LESSON CONTENT


Experiment: Rolling a Die
When you roll a die, there are only six possible outcomes corresponding to the six faces of the die; the numbers
1 to 6. Suppose you roll the die twenty times, there are twenty possible outcomes. Let us record the results of
rolling the die 20 times.
Possible face of the die after each roll (x) No. of times the face appeared [{x}:n]
1 3
2 7
3 4
4 3
5 2
6 1

1. What is the chance that when a die is rolled, the number 2 will appear? The number 5?
2. How can we show graphically the probability of the occurrence of an event?

Recall that since the data to be obtained by a rolling a die are whole numbers from 1 to 6, the variable is
considered discrete random variable. Let us consider how to describe a discrete random variable.
Since the die is rolled 20 times, the total number of occurrences (N) in the experiment is 20 observations (N=20)
From the table, the number of times the possible outcome “2” (x=2) has occurred [(x):n] is 7 or [(2):7].
The chance that a “2” will appear when a die is rolled is the quotient of dividing the number of occurrences
associated to the value of [(x):n] by the total number of observations N. Thus, we get 7/20. This is also known as
the probability of occurrence.
The probability mass function for this random variable is given by

P(X)= [(X):N]
N
We extend this and say that the probability that a “5” will appear when a die is rolled is 2/20.

ALL RIGHTS RESERVED: IMMACULADA CONCEPCION COLLEGE (STATISTICS AND PROBABILITY) PAGE 4 OF 48
Immaculada Concepcion College
Of Soldier’s Hills Caloocan City, Inc.
Soldier’s Hills III Subd. Brgy. 180, Tala, North Caloocan City

To represent the distribution graphically, we can use a histogram.


The histogram is constructed by doing the following:
1. Assign the probability values on the vertical grid.
2. Assign the discrete random variable (x) on the horizontal grid.
3. Use thin lines or sticks to replace the thick bars.

Let us come up with the following table for the die rolling experiment.
Outcomes (x) No. of Occurrences (x):n P(x)
[(x):n] N
1 3 3/20 0.15
2 7 7/20 0.35
3 4 4/20 0.20
4 3 3/20 0.15
5 2 2/20 0.10
6 1 1/20 0.05
N=20 ∑𝑃(𝑋) = 1.00

The following is the histogram showing the probability distribution of the die rolling experiment.
From the above, the following observations and analysis can be made.
1. In a discrete probability distribution, the probability values for all its possible outcomes are greater than
or equal to zero. [P(x)>0] (1st descriptive condition).
2. The sum of the probability values associated to the corresponding outcome is equal to one [∑P(x)=1]
(2nd descriptive]

V. PRACTICE

To illustrate further, let us consider the following:

A common selection criteria being considered by any sports team are the points scored by the prospective
player on every game played. A coaching staff obtained the following data on two players.
Points per game
No. of Games Played Player A Player B
0 2 0
1 3 16
2 5 14
3 10 12
4 15 8
5 4 2
6 8 3
7 5 1
Total 60 60

1. What kind of information or data should be obtained to select the player your team needs?
2. What can be done statistically to get the best player?
3. How can we describe completely the data given?
4. How can we visually represent this distribution with a graph?
5. Who between Player A and Player B is the best player?

ALL RIGHTS RESERVED: IMMACULADA CONCEPCION COLLEGE (STATISTICS AND PROBABILITY) PAGE 5 OF 48
Immaculada Concepcion College
Of Soldier’s Hills Caloocan City, Inc.
Soldier’s Hills III Subd. Brgy. 180, Tala, North Caloocan City

VI. ENRICHMENT

The histogram below gives a better picture of the performance of the two players based on the possibility
of scoring points per game.

Points of Player A
0.3

0.25

0.2

0.15

0.1

0.05

0
0 1 2 3 4 5 6 7

Points of Player B
0.3

0.25

0.2

0.15

0.1

0.05

0
0 1 2 3 4 5 6 7

Activity:
Compute the probabilities for each random variable x. Draw its histogram.
A. Given the variable x and the frequency of its occurrence.

X ƒ P(x)
5 3
10 8
15 26
20 10
25 3

ALL RIGHTS RESERVED: IMMACULADA CONCEPCION COLLEGE (STATISTICS AND PROBABILITY) PAGE 6 OF 48
Immaculada Concepcion College
Of Soldier’s Hills Caloocan City, Inc.
Soldier’s Hills III Subd. Brgy. 180, Tala, North Caloocan City

VII. EVALUATION

1. X is a discrete random variable, given its probability distribution below.

x (𝑥): 𝑛 P(X)
𝑁
100 0.38
250 0.30
380 0.17
420 0.10
510 0.05

Answer the following:


_____________1. What is the probability that x = 250?
_____________2. What is the probability that x is below 420?
_____________3. What is the probability that x is greater than 100 but less than 420?
_____________4. What is the probability that x is greater than or equal to 380?
_____________5. What makes the distribution a valid discrete probability distribution?

2. Complete the table below

X ƒ P(x)
1-5 0.10
6-10
11-15 0.25
16-20 0.50

3. A real state broker needs to advertise 2 townhouses, 2 single detached homes, and 2 duplexes. However, the
broker decides to choose at random only one of the six properties for open house on a certain weekend. Let
the random variable x take on the value:
a. if a townhouse is chosen,
b. if a single detached is chosen,
c. if a duplex is chosen.

4. Which of the following is a property of a discrete probability distribution?


a. The sum of the probabilities for the values or random variable is 1.
b. The probability for every value of a random variable is positive and not greater than 1.
c. At least one of the values of a random variable has a probability equal to 5.
d. The probabilities for any two different values of a random variable are different.

5. In a batch of circuit boards, there is 1 board that needs to be returned to the factory, 2 boards that need
repair but do not need to be sent back to the factory, and 5 boards that are in good working condition.
Answer the following:
a. What is the probability that a circuit board selected at random needs to be returned to the factory?
b. What is the probability that a circuit board is in good working condition?
c. What is the probability that a circuit board needs repair?
d. Construct a probability distribution table.

ALL RIGHTS RESERVED: IMMACULADA CONCEPCION COLLEGE (STATISTICS AND PROBABILITY) PAGE 7 OF 48
Immaculada Concepcion College
Of Soldier’s Hills Caloocan City, Inc.
Soldier’s Hills III Subd. Brgy. 180, Tala, North Caloocan City

Week # ___3___

LESSON 3: Mean of Discrete Probability Distribution &


Variance of Discrete Probability Distribution

I. INTRODUCTION

Discoveries of patterns regarding the likelihood of its occurrences (probability distribution) paved way to
forecasting and estimating significant results of related variables. This lesson set forth to bring about
understanding the concepts and exploring the applications.

II. LESSON OBJECTIVES


a. Learn broad view of probability distribution.
b. Appreciate the significance of probability distribution.
c. Apply the concepts of random variable and its probability distribution in any field of interest.

III. PRE- ASSESSMENT

Solve the following problems.

1. What is the variance of the results if a number is drawn from a jar containing numbers 2, 3, 4, 5, and 6?
2. What is the standard deviation of the results if a number is picked from a jar containing two 2s, three 4s, and
five 6s?
3. By investing in a particular stock, Florence can make $40 in a month with a probability of 0.2 or take a loss of
$10 with a probability of 0.8. What is the variance and standard deviation?
4. A box contains balls numbered 1 through 5. You are to draw a ball from the box and you will be paid 12
chips if the number is even. However, you are going to pay 7 chips if the number is odd. What is standard
deviation of the possible results?
5. Your father said, if in one grading period your grade in math is 90 and above, he will add 50Php to your daily
allowance, 20Php if your grade is ;80-89, but decrease it by 10Php if your grade is 79 and below. If the
probability to get 90 and above is 12% while you have 45% chance to get 80-89, what is the standard deviation
of your possible allowance per quarter?

IV. LESSON CONTENT

Suppose four tiles numbered 1,2,3, and 4 are in a jar. A tile is picked and returned in the jar 15 times.
The results are as follow:

Tile Number of times picked


1 2
2 4
3 8
4 1

From the results, the average number per pick would be computed by:

1 (2) + 2(4) + 3(8) + 4(1)


𝑥̅ =
15
2 + 8 + 24 + 4
=
15

=2.53

This means that for every tile picked from the jar, the number in the tile is in average 2.53. This may not be a
possible result of any individual yield or outcome, but this is very important measure in statistics.

If we rewrite the calculation separating the tile number from the probability of each based on the results, the
computation would be:

ALL RIGHTS RESERVED: IMMACULADA CONCEPCION COLLEGE (STATISTICS AND PROBABILITY) PAGE 8 OF 48
Immaculada Concepcion College
Of Soldier’s Hills Caloocan City, Inc.
Soldier’s Hills III Subd. Brgy. 180, Tala, North Caloocan City

2 4 8 1
𝑥̅ = 1( ) + 2 ( ) + 3 ( ) + 4 ( ) = 2.53
15 15 15 15

The value 2.53 in the example above is called expected value, mathematical expectation, or mean of
the discrete random variable defined.

Definition
If X is a discrete random variable with values 𝑋1, 𝑋2, 𝑋3, … . 𝑋𝑛 with probabilities ƒ(𝑥1 ), ƒ(𝑥2 ), ƒ(𝑥3 ), … . . ƒ(𝑥𝑛) ,
respectively, then the mean or expected value of X denoted by E(X) is:

E(X) = 𝑥1 ƒ(𝑥1 ) + 𝑥2 ƒ(𝑥2 ), + 𝑥3 ƒ(𝑥3 ) + 𝑥𝑛 ƒ(𝑥𝑛)=Ʃ x ƒ (x), for all elements of X

What is the mean outcome if a fair die is rolled?

Answer:

Let Y be the random variable defined by the outcomes. Since the die is fair, each of the outcomes has a
probability 1/6, thus the expected value per roll is:

E(Y) = 1 (1/6) + 2 (1/6) + 3 (1/6) + 4 (1/6) + 5 (1/6) + 6 (1/6)

= 1/6 + 2/6 + 3/6 + 4/6 + 5/6 + 6/6

= 21/6

E(Y) = 3.5

From the example of picking a tile from a jar containing tiled numbers 1, 2, 3 and 4, in which the results after 15
times of picking are:
Tile Number of times picked
1 2
2 4
3 8
4 1
The mean or expected value E(X) IS 2.53.
To better describe these results, the variance of the random variable defined here must also be known.

The variance denoted by 𝜎 2 or V(X) of any random variable X, could be computed by getting the
average of the product of the squared deviations from the mean of X and their corresponding probabilities.
This process is very similar to the way we solve for the variance of any data set (especially if weighted or
grouped). The probabilities of each value of the random variable are used as weights.

That means, variance V(X) as described in the example above is:


V(X)=Ʃ[x – E(X)]2 f(x) (Formula 1 for variance)

By mathematical manipulation and through the idea or property that the sum of all the probabilities or ƒ(x) in a
random variable is 1. It follows

That the variance can also be computed as:


V(X) = E(𝑋 2 )- [E(X)]2 (Formula 2 for variance)

Note that formula 2 is usually used as the computational formula because the use of formula 1 can sometimes
be more difficult especially if the mean, E(X) has a decimal part.

ALL RIGHTS RESERVED: IMMACULADA CONCEPCION COLLEGE (STATISTICS AND PROBABILITY) PAGE 9 OF 48
Immaculada Concepcion College
Of Soldier’s Hills Caloocan City, Inc.
Soldier’s Hills III Subd. Brgy. 180, Tala, North Caloocan City

Aside from the variance, the standard deviation is usually used as a measure of variability. As in any set of data,
of data, the standard deviation is the positive square root of variance. Thus, the standard deviation of a
random variable, say X is Sd(X) = 𝑽(𝑿)

Example 1:
From the experiment above, we can use x to represent the tiled number, and change the “number of times
picked” to ƒ(x) by dividing each of its value by 15, thus the table becomes:

X Ƒ(x)
1 2/15
2 4/15
3 8/15
4 1/15

The formula 1 for variance can be applied. However, more columns can be added to the table to make the
calculation easier.
x Ƒ(x) Xƒ(x) x-E(X) [X-E(X)]2 [𝒙 − 𝑬(𝑿)]2ƒ(x)
1 2/15 2/15 -1.53 2.341 0.312
2 4/15 8/15 -0.53 0.281 0.075
3 8/15 24/15 0.47 0.221 0.118
4 1/15 4/15 1.47 2.161 0,144
E(X)=2.53 V(X) = 0.65

Therefore, V(X)=0.65, and the standard deviation, Sd(X)=0.81

By the use of formula 2 for variance, we have,

X Ƒ(x) Xƒ(X) 𝑿𝟐 𝑿𝟐 ƒ(𝑿)


1 2/15 2/15 1 0.133
2 4/15 8/15 4 1.066
3 8/15 24/15 9 4.800
4 1/15 4/15 16 1.066
E(X)=2.53 E(𝑿𝟐 )=7.07
𝑉(𝑋) = 𝐸(𝑋 2 ) − [𝐸(𝑋)]2
= 7.07 – (2.53)2
= 7.07 – 6.40

V(X) = 0.67
Sd(X) = 0.82

Notice that the results have minimal little discrepancies. These are of course accounted from rounding off value
throughout the computations.

V. PRACTICE

A. Xander is paid P20 whenever the results of tossing two coins are both heads but pays P10 whenever the
results are not both heads. What is his expected gain per toss?

Let X be the random variable defined. There are 4 outcomes in tossing two coins, in which only 1 is a HH. The
other results are HT, TH, and TT. The probability of both heads is ¼ while the probability of not both heads is ¾,
therefore, Xander’s expected gain per toss is:

E(X) = (P20)(1/4) + (-P10)(3/4)


= P5 – P7.50

E(X) = ?????

ALL RIGHTS RESERVED: IMMACULADA CONCEPCION COLLEGE (STATISTICS AND PROBABILITY) PAGE 10 OF 48
Immaculada Concepcion College
Of Soldier’s Hills Caloocan City, Inc.
Soldier’s Hills III Subd. Brgy. 180, Tala, North Caloocan City

B. The probability distribution below shows the number of typing errors (x) and the probability ƒ(x) of committing
these errors whenever clerks type-in a document.
Compute the variance and standard deviation.

Y 0 1 2 3 4 5

ƒ(y) 0.02 0.11 0.42 0.31 0.10 0.04

VI. ENRICHMENT

1. Find the expected number of monthly absences of Jemar based on his previous records of absences as
presented in the probability distribution below.

Number of Absences Percent

0 25%

1 30%

2 30%

3 15%

2. Find the variance of the number of monthly absences of Jemar based on his previous records of absences as
presented in the probability distribution below.

Number of Absences Percent

0 25%

1 30%

2 30%

3 15%

VII. EVALUATION

A. Determined the Mean or Expected Value of each Random Variable.

1.
X 0 1 2 3 4
P(x) 1/5 1/5 1/5 1/5 1/5

2.
Y 1 2 3
P(y) 1/2 1/6 1/3

3.

Z 3 5 7 9
P(z) 0.6 0.1 0.2 0.1

4.

R 10 20
P(r) 3/7 4/7

ALL RIGHTS RESERVED: IMMACULADA CONCEPCION COLLEGE (STATISTICS AND PROBABILITY) PAGE 11 OF 48
Immaculada Concepcion College
Of Soldier’s Hills Caloocan City, Inc.
Soldier’s Hills III Subd. Brgy. 180, Tala, North Caloocan City

5.

S 3 4 12 20
P(s) 0.1 0.5 0.2 0.2

6.

S 4 6 8 10
P(s) 1/12 3/12 7/12 1/12

7.
Y 1 2 3
P(y) 20% 30% 50%

8.

S 3 6 9 10 15
P(s) 0.25 0.30 0.05 0.20 0.20

B. Solve the following problems.

1. The random variable X, representing the number of nuts in a chocolate bar has the following probability
distribution. Compute the mean.

X 0 1 2 3 4
P(x) 1/10 3/10 3/10 2/10 1/10

2. Find the mean of the random variable Y representing the number of red m&m’s chocolates per 160-gram
pack that has the following probability distribution.

S 3 4 12 20
P(s) 0.1 0.5 0.2 0.2

3. Find the mean of the random variable Z representing the number of male teachers per elementary school.

Z 3 4 5 6 7
P(z) 40% 32% 11% 9% 8%

4. Find the expected number of times a baby wakes his/her mother after midnight, given the following
probability distribution.

X 1 2 3 4 5
P(x) 0.12 0.25 0.45 0.1 0.08

Week # ____4______

LESSON 4: Normal Distribution


I. INTRODUCTION
One of the most important and widely used topics in statistics is normal distribution. In this lesson we will discuss
different ways of determining what kind of distribution is being represented by a given set of data. You will learn
how to find the area under the normal curve, and experience usefulness of normal distribution in solving real life
problem.
ALL RIGHTS RESERVED: IMMACULADA CONCEPCION COLLEGE (STATISTICS AND PROBABILITY) PAGE 12 OF 48
Immaculada Concepcion College
Of Soldier’s Hills Caloocan City, Inc.
Soldier’s Hills III Subd. Brgy. 180, Tala, North Caloocan City

II. LESSON OBJECTIVES


a. Demonstrates understanding of key concepts of normal distribution
b. Construct a normal curve
c. Apply the concepts involving normal random variables in solving real life problems.

III. PRE- ASSESSMENT


Determine the area BELOW the following:
____________________1. z = 2
____________________2. Z = 3.1
____________________3. Z = 1.5
____________________4. Z = 2.14
____________________5. Z = -2.8
____________________6. Z = -2.15
____________________7. Z = -0.12
____________________8. Z = 1.67
____________________9. Z = -0.76
____________________10. Z = 0.1

IV. LESSON CONTENT


The most important of all continuous probability distribution is the normal distributions its graph is called the
normal curve, is a bell – shaped curve. It lies entirely above the horizontal axis. It is symmetrical, unimodal, and
asymptotic to the horizontal axis.

The area between the curve and the horizontal axis is exactly equal to 1. Half of the area is above the mean
and the remaining half is below the mean.

Standard Normal Distribution

There are many normal distributions. A normal distribution is determined by two parameter: the mean μ and the
standard deviation σ. If the mean μ is 0 and the standard deviation σ is 1. Then the normal distribution is a
standard normal distribution the areas under this curve can be found using the Areas under the Normal Curve
Table.

However, the mean μ is not always equal to 0 and the standard deviation σis not always equal to 1. In the normal
curve below, μ= 40 and σ= 12

ALL RIGHTS RESERVED: IMMACULADA CONCEPCION COLLEGE (STATISTICS AND PROBABILITY) PAGE 13 OF 48
Immaculada Concepcion College
Of Soldier’s Hills Caloocan City, Inc.
Soldier’s Hills III Subd. Brgy. 180, Tala, North Caloocan City

In the normal curve below, μ= 75 and σ = 10

Suppose two curves are sketched above the same horizontal axis and those normal curves have the same
standard deviations but different means.

Notice that if the mean μis changed from 55 to 39, the curve is moved to the left but its shape remains the same.

Suppose the normal curves have the same means but different standard deviations.

ALL RIGHTS RESERVED: IMMACULADA CONCEPCION COLLEGE (STATISTICS AND PROBABILITY) PAGE 14 OF 48
Immaculada Concepcion College
Of Soldier’s Hills Caloocan City, Inc.
Soldier’s Hills III Subd. Brgy. 180, Tala, North Caloocan City

Notice that the shape if the normal curve with σ= 20 is flatter than that with aσ= 10.

Suppose the curves have different means and different standard deviations.

Notice that they are centered at different positions on the horizontal axis. The normal curve on the left is flatter
and spreads out further. This is because it has a larger standard deviation.

Areas under the Normal Curve

Areas under normal curve can be found using the Areas under the Standard Normal Curve table. Those areas
are regions under the normal curve.

ALL RIGHTS RESERVED: IMMACULADA CONCEPCION COLLEGE (STATISTICS AND PROBABILITY) PAGE 15 OF 48
Immaculada Concepcion College
Of Soldier’s Hills Caloocan City, Inc.
Soldier’s Hills III Subd. Brgy. 180, Tala, North Caloocan City

ALL RIGHTS RESERVED: IMMACULADA CONCEPCION COLLEGE (STATISTICS AND PROBABILITY) PAGE 16 OF 48
Immaculada Concepcion College
Of Soldier’s Hills Caloocan City, Inc.
Soldier’s Hills III Subd. Brgy. 180, Tala, North Caloocan City

EXAMPLE 1: Find the area between z = 0 and z = 1.54

SOLUTION 1:

Step 1.Sketch the normal curve.

Since 1.5 is positive. It is somewhere


to the right of 0.

Step 2. Locate the area for z = 1.54


from the Areas under the Standard
Normal Curve Table. Proceed down the
column marked z until you reach 1.5. Then
proceed to the right along this row until
you reach the column marked 4. The
intersection of the row and the column marked 4 is the area. The area is 0.4382.

EXAMPLE 2: Find the area between z = 1.52 and z = 2.5.

SOLUTION 2:

Step 1: Sketch the normal curve.

Step 2. Let A = area between z = 1.52 and z = 2.5

A1 = area between z = 0 and z = 1.52

A2 = area between z = 0 and z = 2.5

From the table

A1 = .4357
A2 = .4938

A = A2 -A1

A = .4938 - .4938
= 0. 0581
Hence, the area between z = 1.52 and z = 2.5 is 0. 0581

APPLICATIONS OF NORMAL DISTRIBUTION


The standard score or z – score measures how many standard deviation a given value (x) is above or
below the mean.

➢ A positive z – score – indicates that the score or observed value is above the mean.
➢ A negative z – score – indicates that the score or observed value is below the mean.

EXAMPLE 1: The scores of the students in the midyear examination for Mathematics has a mean (μ)of 32 and
a standard deviation (𝜎) of 5. Find the z – scores corresponding to each of the following:
a) 37
b) 22
c) 33
d) 28

ALL RIGHTS RESERVED: IMMACULADA CONCEPCION COLLEGE (STATISTICS AND PROBABILITY) PAGE 17 OF 48
Immaculada Concepcion College
Of Soldier’s Hills Caloocan City, Inc.
Soldier’s Hills III Subd. Brgy. 180, Tala, North Caloocan City

SOLUTIONS:
𝑥−μ 37−32 5
a) 𝑧 = = = =1
𝜎 5 5

𝑥−μ 22−32 −10


b) 𝑧 = = = = -2
𝜎 5 5
𝑥−μ 33−32 1
c) 𝑧 = = = = 0.2
𝜎 5 5

𝑥−μ 28−32 −4
d) 𝑧 = = = = -0.8
𝜎 5 5

V. PRACTICE

I. Find the area under the normal curve in each of the following cases.

1. Between z = 0 and z = 1.63


2. Between z = 1.56 and z = 2.51
3. Between z = -2.46 and z = 1.55
4. Between z = -0.76 and z = 1.35
5. Between z = -2.76 and z = -1.25
6. To the right of z = 2.35
7. To the left of z = 1.85
8. To the right of z = -2.85
9. To the left of z = 0.89
10. To the right of z = -0.75

VI. ENRICHMENT

A. A light bulb manufacturer knows that the life time of their manufactured light bulbs is normally distributed
with a mean life of 2150 hours and a standard deviation of 75 hours.
1. What is the proportion of light bulbs with life time exceeding 2000 hours?
2. If the standard deviation remains the same, find the necessary mean life time so that 98% of the light
bulbs will last more than 2000 hours?
3. If the mean life remains at 2150 hours, find the necessary standard deviation so that 98% of the light
bulbs will last more than 2000 hours.

B. In R&B Manufacturing Company, workers are able to produce an average of 250 units of its product per
person per day with a standard deviation of 25 units. In order to raise the productivity level, the management
announced that there will be an incentive pay for the top 20% producers.
1. If a worker is chosen at random, what is the probability that the worker can produce:
a. more than 270 units per day
b. less than 260 units per day
2. What is the minimum number of units that a worker should produce in order to qualify for the
incentive pay?

C. Copper rods are mass-produced at XYZ Factory. A customer ordered rods with lengths 45cm on the
condition that the rods will be acceptable if their lengths lie within the limits 44.95 cm and 45.05 cm. On testing
the rods supplied to him, the customer finds that 5% are under-size and 10% are over-size. If the lengths of the
rods are normally distributed, find its mean length and standard deviation.

ALL RIGHTS RESERVED: IMMACULADA CONCEPCION COLLEGE (STATISTICS AND PROBABILITY) PAGE 18 OF 48
Immaculada Concepcion College
Of Soldier’s Hills Caloocan City, Inc.
Soldier’s Hills III Subd. Brgy. 180, Tala, North Caloocan City

VII. EVALUATION

A. Given a normal distribution with a mean of 42 and standard deviation of 6, find the area BELOW.
(SHOW YOUR SOLUTION)

1. 36

2. 54

3. 38

4. 60

5. 58

B. Given a normal distribution with a mean of 125 and standard deviation of 15, find the area ABOVE.
(SHOW YOUR SOLUTION)

6. 128

7. 119

8. 158

9. 100

10. 120

. Given a normal distribution with a mean of 24 and standard deviation of 4, find the area BETWEEN the
following:

11. 28 and 30

12. 12and 38

13. 16 and 22

14. 19 and 31

15. 17 and 24

ALL RIGHTS RESERVED: IMMACULADA CONCEPCION COLLEGE (STATISTICS AND PROBABILITY) PAGE 19 OF 48
Immaculada Concepcion College
Of Soldier’s Hills Caloocan City, Inc.
Soldier’s Hills III Subd. Brgy. 180, Tala, North Caloocan City

Week # ___5______

LESSON 5: SAMPLING AND SAMPLING DISTRIBUTION

I. INTRODUCTION
Oftentimes, in our researches or even in daily activities, we are concerned with a large group of people or
objects. It is of course difficult, or sometimes impossible to deal with every member of this large group known as
population. In times like this, we have remedy, which is selecting a portion known as sample, and this process is
called sampling.

II. LESSON OBJECTIVES


a. Illustrate a sampling and sampling distribution
b. Construct a sampling distribution
c. Identify the sampling distribution.

III. PRE- ASSESSMENT


The Pair game?
Consider a deck of 52 cards, having 4 suits (Spade, Diamond, Heart, and Club). Each suit has 3 face card (King,
Queen, and Jack), and 10 number cards (1 to 10). A card numbered 1 is also called an ace. The game is being
played by a group of students. The game is simply to draw a pair of cards from the deck. To get a point, the
cards drawn should be a pair consisting of a face card and a number card. The player with the most number
of pairs in a deal of 6 cards wins. The students would like to know the following:

1. How many possibilities are there in drawing a pair?


2. How many numbers of possible pairs are there?
3. List 10 possible pairs that can be drawn.
4. What is the probability of getting a King paired with card numbered 10?

IV. LESSON CONTENT

Sampling
Oftentimes, in our research or even in daily activities, we are concerned with a large group of people or objects.
It is of course very difficult, or sometimes impossible to deal with every member of this large group known as
population. In times like this, we have a remedy, which is, selecting a portion of the population known as sample.
This process is called sampling. One of the best methods of sampling which is usually used in research is called
sampling.

Parameter and Statistic

If a population we are concerned with is finite or small in number, say the 25 captive – bred Philippine Eagles
successfully produced by the Philippine Eagle Foundation (PEF) as of October 15, 2015, then, we can easily
describe it. Every measurement or quantity that represents the general characteristics of this population, say the
average height of these 25 captives – bred raptors in 2.5 meters, is called parameter.

On the other hand, if we are dealing with every large population and we have resorted to sampling, then, every
measurement or quantity that describes the characteristics of the sample is called sample statistic or simply
statistic.

ALL RIGHTS RESERVED: IMMACULADA CONCEPCION COLLEGE (STATISTICS AND PROBABILITY) PAGE 20 OF 48
Immaculada Concepcion College
Of Soldier’s Hills Caloocan City, Inc.
Soldier’s Hills III Subd. Brgy. 180, Tala, North Caloocan City

Sampling Distribution Of The Sample Mean

Suppose a jar contains number 1, 3 and 5. If we take two numbers in succession with replacement,
then, the possible 2 – number samples are: (1.1), (3,3), (5,5), (1,3), (3,1),(1,5),(5,1),(3,5) and (5,3). The average or
mean of each pair, in that order are 1,3,5,2,2,3,3,4 and 4. If we denote the means as random variable X, then

X = {1, 2, 3, 4}

As we can see, P(1) = 1/9, P(2) = 2/9, P(3) = 3/9, or 1/3, P(4) = 2/9, and P(5) = 1/9

therefore, the probability distribution of X is:

X 1 2 3 4 5

f(X) 1/9 2/9 1/3 2/9 1/9

The probability distribution above represents the means of the samples, that’s why the distribution is now called
Sampling Distribution of the Sample Means.

Example 1:
In order to test the effect of the new drug to humans, 20 patients were given the dose. After a minute, it was
found that the body temperature in average, decreased by 20C. Answer the following:
a).Are the 20 patients mentioned above population or sample?

b). Is the 20C decrease in the body temperatureconsidered parameter or statistics?

Answer:

a. The 20 patients taken are considered sample.

b. Since the measurement 20C refers to the average decrease of the 20 patients (sample), it is therefore
considered as statistic.

Example 2

Construct the sampling distribution of the sample means when two dice are rolled.

Answer:

If we construct a table for the mean of the results, it would be:

1 2 3 4 5 6

1 1 1.5 2 2.5 3 3.5

2 1.5 2 2.5 3 3.5 4

3 2 2.5 3 3.5 4 4.5

4 2.5 3 3.5 4 4.5 5

5 3 3.5 4 4.5 5 5.5

6 3.5 4 4.5 5 5.5 6

ALL RIGHTS RESERVED: IMMACULADA CONCEPCION COLLEGE (STATISTICS AND PROBABILITY) PAGE 21 OF 48
Immaculada Concepcion College
Of Soldier’s Hills Caloocan City, Inc.
Soldier’s Hills III Subd. Brgy. 180, Tala, North Caloocan City

1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6

1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36

f ( X)

V. PRACTICE

Construct the sampling distribution of the sample Means and answer the questions that follow:

A jar contains number 1,2,3 and 4. Construct the sampling distribution of the sample means when two numbers
are taken from the jar with replacement.
1. What is the probability that the mean of the number is 2.5?

2. What is the probability that the mean of the numbers is less than 2?

3. What is the probability that the mean of the numbers is greater than 1.5?

4. What is the probability that the mean of the numbers is between 1.5 and 5.
Construct the histogram of the sampling distribution.

VI. ENRICHMENT

The totality of subjects (people, animals or subjects) under consideration is called population. The portion
chosen from a population is called sample and the process of taking samples is called sampling.

Random Sampling refers to the sampling technique in which each member of the population is given equal
chance to be chosen as part of the sample. The lottery method, drawing lots, or the use of random numbers
can be used to accomplish random sampling.

The measurement or quantity that describe the population is called parameter while the measurement or
quantity that describe the sample is called statistics.

VII. EVALUATION

Construct the sampling distribution of the sample Means and answer the questions that follow:

A. Adrian Cedrick receives 82 or 83 as his grade on his three major subjects.


Construct the sampling distribution of his mean grade

1. What is the probability that his mean grade is lower than 83?

2. What is the probability that his mean grade is greater than 82.33?

3. What is the probability that his mean grade is 82.67?

4. What is the probability that his mean grade is between 82.33 and 83?

5. Construct the histogram of the sampling distribution of the mean grade.

ALL RIGHTS RESERVED: IMMACULADA CONCEPCION COLLEGE (STATISTICS AND PROBABILITY) PAGE 22 OF 48
Immaculada Concepcion College
Of Soldier’s Hills Caloocan City, Inc.
Soldier’s Hills III Subd. Brgy. 180, Tala, North Caloocan City

B. Three containers contain the numbers 0,1, and 2. Construct the sampling distribution of the sample mean
when a number is taken from each container.

1. What is the probability that the sample mean is less than 1?

2. What is the probability that the sample mean is greater than 0.67?

3. What is the probability that the sample mean is 1?

4. What is the probability that the mean is between 1 and 2?

5. Construct the histogram of the sampling distribution of the sample means.

C. Determine if the given subject is population_ or_ sample, then describe the given quantity as parameter or
statistic:

1. The average grade of the whole class under study is 82.15.


Whole class: _______________________
Average grade (82.15): _____________________

2. 50 out of the 200 animals in the zoo were taken and checked on their weight, The variance of their weight is
12.5 kg.
50 animals: __________________________
Variance (12.5 kg): ___________________

3. The standard deviation of the life span of a specie endemic in the Philippines is 2.3 years
A specie endemic in the Philippines: ____________________
Standard Deviation (2.3 years): _________________________

4. Based on the survey conducted to 1200 respondents, I out of 3 Filipinos can’t live without cell phone.

1200 respondents: ____________________________

1 out of 3 Filipinos can’t live without cell phone: __________________________

5. Based on the US National Hospital Discharge Record in 2010, the average length of stay of patients in US
hospitals in US hospitals is 4.8 days.

Patients: ________________________________

Average stay (4.8 days): _______________________________

ALL RIGHTS RESERVED: IMMACULADA CONCEPCION COLLEGE (STATISTICS AND PROBABILITY) PAGE 23 OF 48
Immaculada Concepcion College
Of Soldier’s Hills Caloocan City, Inc.
Soldier’s Hills III Subd. Brgy. 180, Tala, North Caloocan City

Week # ______6____

LESSON 6: Mean and Variance of Sampling Distribution

I. INTRODUCTION
In probability distributions, it is important that we know the mean and variance or standard deviation of the
sampling distribution, specifically of the sample means. This lesson includes the discussion of the mean and
variance as well as the standard deviation of the sampling distribution of the sample means.

II. LESSON OBJECTIVES


a. Understand the concept of sampling distribution.
b. Determine the mean, variance of sampling distribution;
c. Compute the mean and the variance of sampling distribution

III. PRE- ASSESSMENT


A population of N = 5 numbers: 2,4,6,8 and 10. If a random sample of size n = 3 is selected without replacement,
complete the following tables.
Sample Data (Sample values) Sample Mean
1
2
3
4
5
6
7
8
9
10

IV. LESSON CONTENT

Mean of Sampling Distribution of the Sample Means


If a sample size n is taken from a population with mean µ and variance σ2, then the sample
mean (x ) or expected value E(X) is:

EX = X1 + X2 + X3 +…. + Xn
n
THEOREM

If all possible random samples of size n are taken with replacement (independent) from a
population with a mean µ and variance σ 2 , then the mean (µx) and standard deviation (σx)
of the sampling distribution of the sample mean are:
µx = µ (mean)
σ2 x = σ2 (Variance)
n

σ2 x = σ2 (standard deviation or standard error)


√𝑛
If all possible samples of sizen are taken without replacement (dependent) from a finite
population of size N with a mean (µ) and variance σ2, then the mean (µx), variance (σ2 x) and
standard deviation (σ x) of the sampling distribution of the sample mean are:

µx = µ (mean)
ALL RIGHTS RESERVED: IMMACULADA CONCEPCION COLLEGE (STATISTICS AND PROBABILITY) PAGE 24 OF 48
Immaculada Concepcion College
Of Soldier’s Hills Caloocan City, Inc.
Soldier’s Hills III Subd. Brgy. 180, Tala, North Caloocan City

σ2 x = σ2N – n (variance)
n N-1

σ x = σ√N – n (Standard deviation or standard error)


√n N – 1

Note: The factor N – n is called correction factor for finite population. It will be close
N–1
To 1 and can be safely – ignored when n is small compared to N.

Note :we increase the sample size, the variance of the sample mean decreases.

Example :

From our previous example, suppose a jar contains numbers 1,3 and 5.

Show that µx = µ and σ2 x = σ2N – n


n
Answer:

The probability distribution is:

X 1 2 5
f(x) 1/3 1/3 1/3

Since the distribution is uniform, that is, the observations have the same probabilities, the mean (µ) and sample
variance (σ2) can be easily computed as:

µ = 1 (1/3) + 3 (1/3) + 5 (1/3) =9/3 = 3

σ2 = ½(1-3)2 + 1/3 (3-3)2 + 1/3( 5-3)2 = 4/3 + 0 + 4/3 = 8/3

If we take two numbers in succession with replacement, then, the possible 2 – number sample are: (1,1), (3,3),
(5,5), (1,3), (3,1), (1,5), (5,1), (3,5) and (5,3). The average or mean of each pair, in that order are 1, 3, 5, 2,2,3,3,4
and 4,

The probability distribution of x becomes:

1 2 3 4 5
X
1/9 2/9 3/9 2/9 1/9
f(x)

The mean of the sampling distribution (using the formula for mean of random variable) is:
µx =∈ 𝑥𝑓(𝑥)
= 1(1/9) + 2(2/9) + 3(3/9) + 4(2/9) + 5(1/9)

= 1/9 + 4/9 + 9/9 + 8/9 + 5/9

= 27/9

= 3, but µ = 3

Therefore, µẊ = µ

ALL RIGHTS RESERVED: IMMACULADA CONCEPCION COLLEGE (STATISTICS AND PROBABILITY) PAGE 25 OF 48
Immaculada Concepcion College
Of Soldier’s Hills Caloocan City, Inc.
Soldier’s Hills III Subd. Brgy. 180, Tala, North Caloocan City

The variance of the sampling distribution (using the formula for variance of random variable) is:

σ2 x = E(X2) – [E(X)]2

= 𝜀 (𝑥)2 f(x) - [𝜀 𝑥 𝑓(𝑥)]

= [12 (1/9) + 22(2/9) + 42 (2/9) + 52(1/9)] – (3)2

= (1/9 + 8/9 + 27/9 +32/9 + 25/9 ) – 9

= 93/9 – 9

= 4/3 , but σ2 = 8/3 and 4/3 = (8/3) ,


2

So, σ2 x = σ2 since the sample size n is 2,

Therefore, σ2 x = σ2
n

V. PRACTICE
Determine the mean (µx), variance (σ2 x)and standard deviation σ x) of each.( Show your solution)

1. A random sample size 4 is taken with replacement from a population with µ = 12 and σ2 = 8
µx = _____ σ2 x = ______ σ2 x = ________

2. 2. An independent random of sample size 9 is taken from population with µ = 25.2 and σ2= 12
µx = ______ σ2 x = ______ σ2 x = __________

3. A random sample size 25 is taken with replacement from a population with µ = 121.4 and σ2
= 50.5.
µx = _______ σ2 x = _______ σ2 x = __________

4. independent random of sample size 100 is taken with the replacement from population with
µ = 72 and σ2= 25.
µx = ____ σ2 x = ______ σ2 x = __________

5. A random sample size 40 is taken with replacement from a population with µ = 82.4 and σ2 = 60.

µx = ____ σ2 x = ______ σ2 x = __________

6. A random sample size 3 is taken with replacement from a population with µ = 8 and σ2 = 2
µx = ____ σ2 x = ______ σ2 x = __________

7. A random sample size 20 independent observation is taken from a population withµ = 48 and σ2 = 5.
µx = ____ σ2 x = ______ σ2 x = __________

8. A random sample size 30 is drawn with replacement from a population with µ = 48 and σ2 = 6.5.
µx = ____ σ2 x = ______ σ2 x = __________

9. A random sample size 1600 is taken with replacement from a population with µ = 509.23
andσ2 = 40.
µx = ____ σ2 x = ______ σ2 x = __________

10. A random sample size 120 is taken with replacement from a population with µ = 120 and σ2 = 28
µx = ____ σ2 x = ______ σ2 x = __________

ALL RIGHTS RESERVED: IMMACULADA CONCEPCION COLLEGE (STATISTICS AND PROBABILITY) PAGE 26 OF 48
Immaculada Concepcion College
Of Soldier’s Hills Caloocan City, Inc.
Soldier’s Hills III Subd. Brgy. 180, Tala, North Caloocan City

VI. ENRICHMENT

Compute the mean (Ux), variance (a2 x) and standard deviation (σx) of the sampling distribution taken from
the following populations.

1. When 25 numbers are drawn with replacement from a jar containing 1,3,5 and 6.
2. When a sample of size 9 are taken with replacement from the population 1,1,2,2,2,3,3,4
3. When 36 samples are taken with replacement from the population 7, 6,6,6,5,4,3,1,1 and 1.
4. When an unbiased die is rolled 50 times.
5. When a biased die whose even numbers come up twice as the odd numbers is rolled 16 times.
6. A community has 1500 people with a mean age of 42 and variance of 16. If you draw a random sample
of 30 people, what are mean variance and standard error of the sampling distribution of their ages?
7. What are the mean, variance and standard error of the sample mean when 60 students are taken from a
population of 2000 with a mean score of 75 and standard deviation of 5?
8. The mean sugar level of 1000 patients in XYZ Hospital is 150 mg/Dl with variance of 64. If 50 of them were
taken as samples, what are the mean, variance and standard error of the sampling distribution?
9. The mean IQ of 1000 students of AJ University is 98 with standard deviation of 4. If 100 of them were taken as
samples, what are the mean, variance and standard error of the sampling distribution?
10. The mean monthly salary of the 1440 employees of Ragos Electrical Company is P20,000 with standard
deviation of P800. If 40 from them were randomly selected, what are the mean, variance and standard error of
the sampling distribution?

VII. EVALUATION

Compute the mean (μx) variance (σ2 ) and standard deviation (𝜎𝑥) of each sampling distribution.
𝑋

1. When 100 independent samples are taken

X 1 2 3 4 5

ƒ(x) 1/12 3/12 4/12 3/12 1/12

2. When 4 independent samples are taken

X 0 1 2 3 4

ƒ(x) 1/5 1/5 1/5 1/5 1/5

3. When 25 samples are drawn with replacement

Y 1 2 3

P(y) 1/2 1/6 1/3

ALL RIGHTS RESERVED: IMMACULADA CONCEPCION COLLEGE (STATISTICS AND PROBABILITY) PAGE 27 OF 48
Immaculada Concepcion College
Of Soldier’s Hills Caloocan City, Inc.
Soldier’s Hills III Subd. Brgy. 180, Tala, North Caloocan City

4. When 64 samples are drawn with replacement

z 3 5 7 9

Ƒ(z) 0.6 0.1 0.2 0.1

5. When 36 samples are drawn with replacement

r 10 20
Ƒ(r) 3/7 4/7

6. When 81 samples are taken with replacement

s 3 4 12 20

Ƒ(s) 0.1 0.5 0.2 0.2

7. When 9 is independent samples are taken

t 5 10 20

P(t) 50% 12% 38%

8. When 20 samples are taken with replacement

V -1 0 1

P(v) 0,3 0.2 0.5

9. When 1000 samples are taken with replacement

m -5 -2 2 4

Ƒ(m) 40% 25% 15% 20%

10. When 40 independent samples are taken

K 0.1 0.2 0.3 0,4

Ƒ(k) 0.1 0.2 0.3 0.4

ALL RIGHTS RESERVED: IMMACULADA CONCEPCION COLLEGE (STATISTICS AND PROBABILITY) PAGE 28 OF 48
Immaculada Concepcion College
Of Soldier’s Hills Caloocan City, Inc.
Soldier’s Hills III Subd. Brgy. 180, Tala, North Caloocan City

Week # ___7______

LESSON 7: Sampling Distribution of Large Sample Size (n≥ 𝟑𝟎)


Sampling Distribution of Small Sample Size (n< 𝟑𝟎)

I. INTRODUCTION
Sampling distributions are important in the understanding of statistical inference. Statistical inference
techniques are based on the concept of the sampling distribution of statistic. Probability distribution allowed us
to answer questions about sampling and they provide the foundation for statistical inference procedures.
In this lesson you will learn the concept of sampling distribution and its application.

II. LESSON OBJECTIVES


a. Understand the concept of sampling distribution of small and Large sample size.
b. To learn what the sampling distribution of 𝑥̅ is when the sample size is large.
c. To learn what the sampling distribution of 𝑥̅ is when the population is normal.

III. PRE- ASSESSMENT

Let X1, X2, X3, ..., X9 be independent normal random variables with mean μX = 3 and standard
deviation σX = 2. Let 𝑋̅ be the distribution of the mean of these 9 random variables, namely 𝑋̅ =
𝑋1+𝑋2+⋯+𝑋9
9
(a) What is the shape of the distribution of 𝑋̅?

(b) What is the mean of the distribution of 𝑋̅?

(c) What is the standard deviation of the distribution of 𝑋̅?

(d) Can we determine P(𝑋̅ < 2.5) using a z-score? You do not need to compute this probability,
just answer yes or no and briefly explain why or why not.

IV. LESSON CONTENT


Large Sample Size:
Statistician consider a sample of size 30 or more as large. If this large sample size is taken from population with
mean μ and standard deviation σ then the sampling distribution of the sample mean approaches the normal
distribution with a mean μx¯¯¯=μ and standard deviation σ/√𝑛, thus can be standardized as:
_
x− μ
Z=
σ/√𝑛

Theorem
If random samples of size n are taken from a population with a mean μ and standard deviation σ, then
the sampling distribution of the sample mean X approaches normal distribution with mean μx¯¯¯=μ and
𝜎
standard deviation σ𝑥. = thus can be standardized as
√𝑛
𝐱− 𝛍
Z=
𝛔/√𝒏
As the n increases, the sampling distribution of the sample mean gets nearer and nearer to the normal
distribution.
Note:
➢ If σ in unknown, compute the sample standard deviation s then use it to replace σ in the formula
provided than n≥ 30.
➢ Even if n < 30, the formula can still be used provided that the population is approximately normal
and the population standard deviation σ is known.

Example 1.
The height of pupils in Luna Elementary School has a mean of 121 cm with standard deviation of 5. If 50 of them
are taken as samples, what is the probability that their mean weight is less than 120 cm?

Answer:
From the problem, μ = 121, σ = 5, x = 120, and n = 50. Using the formula:

ALL RIGHTS RESERVED: IMMACULADA CONCEPCION COLLEGE (STATISTICS AND PROBABILITY) PAGE 29 OF 48
Immaculada Concepcion College
Of Soldier’s Hills Caloocan City, Inc.
Soldier’s Hills III Subd. Brgy. 180, Tala, North Caloocan City

𝐱− 𝛍 𝟏𝟐𝟎−𝟏𝟐𝟏 −𝟏
Z = 𝛔/ = = 𝟎.𝟕𝟎𝟕 = 1.41
√ 𝒏 𝟓/√𝟓𝟎

Using the z-table, the area below z = -1.41 is 0.0793. Thus, the probability that the mean weight of the sample is
less than 120 cm is 0.0793 or 7.93%.

Small Sample Size:


If the sample size is less than 30(n< 30), it is considered small, thus, even if the variance of the population is
given, the formula for standardizing the sampling distribution of the sample mean cannot be used. For this small
sample, the normality of the distribution sample mean cannot be guaranteed, thus, the z-table cannot be
used.

Theorem
If 𝑥̅ and s are the mean and standard deviation, respectively, of a random samples of size n taken from a
normally distributed population with a mean μ, can be standardized as:
𝐱− 𝛍
t=
𝒔/√𝒏
a value of a random variable T following the t-distribution.

Note:
➢ The formula is used when n < 30 and the population standard deviation is unknown.

Recall: The sample standard deviation is computed as:

S= √Ʃ(𝑥 + 𝑥̅ )2
n-1

The T- distribution

The t- distribution, like the z-distribution/normal distribution, is belle shaped and symmetric about the y-axis. As
compared to the z-distribution, the t-distribution is more variable since its value depends on the fluctuations of
mean and variance from sample to sample. Notice from the formula of s or s2 the divisor n-1 instead of n, which
is called degrees of freedom, d𝑓. This means that the t- distribution is different from sample to sample. Since it is
not practical to create the t-distribution from d𝑓=1, to d𝑓 = 28, only values of t for some special areas such as
0.005, 0.001, 0.025 etc. These special areas are denoted by a. if a= 0.05, then it refers to the area 0.05 or 5% on
the right tail of the t-curve for any v. The notation ta,df is a way of conveniently writing the t-value at a given 𝛼
and d𝑓. The notation ta=0.05, df=20 means the t-value corresponding to the 𝛼=0.05 and d𝑓=20. To look for this value
in the t-table, first locate 𝛼 on the top row, then the 𝑑𝑓 on the leftmost column. The intersection of 𝛼 = 0.05, and
d𝑓=20 is 1.725. Thus, t- 1.725.

Example:
What is the t-value when n=22 at 𝛼=0.01, then t = 2.518

V. PRACTICE

Determine the value of t based on the given sample size n and 𝛼.


1. n = 25, a = 0.05
2. n = 16, a = 0.01
3. n = 27, a = 0.025
4. n = 9, a = 0.1
5. n = 11, a = 0.05
6. n = 17, a = 0.005
7. n = 6, a = 0.01
8. n = 4, a = 0.0025
9. n = 19, a = 0.05
10. n = 25, a = 0.01

ALL RIGHTS RESERVED: IMMACULADA CONCEPCION COLLEGE (STATISTICS AND PROBABILITY) PAGE 30 OF 48
Immaculada Concepcion College
Of Soldier’s Hills Caloocan City, Inc.
Soldier’s Hills III Subd. Brgy. 180, Tala, North Caloocan City

VI. ENRICHMENT

Solve the following problems:

A. Assume that the heights of adult women ware normally distributed with a mean of 63 in and standard
deviation of 2.5 in.
1. If 36 women are randomly selected, what is the probability that the mean height is less than 62 in.
2. If 70 women are taken as samples, what is the probability that their mean height is greater than 62.5 in?
3. If 100 women are randomly selected, what is the probability that their mean height is between 63.2 in
and 63.8 in?

B. Replacement times of TV sets are reported to follow a normal distribution having a mean of 8.5 years with
standard deviation of 1.2 years
4. If 30 TV sets are selected at random, what is the probability that the mean replacement time is less than
8 years?
5. If 20 TV sets are taken as samples, what is the probability that the mean replacement time is longer than
7.8 years?
6. If 25 TV sets are selected, what is the probability that the replacement time is between 8.4 years and 9
years.

C. American teenage girls are reported to spend an average of $31 on shopping per month, with standard
deviation of $8. If these expenses are normally distributed, answer the following.
7. If 85 American teenage girls are randomly selected, what is the probability that their mean expenses on
shopping per month is less than $30?
8. If 60 American teenage girls are selected, what is the probability that their mean expenses on shopping
is greater than $32.5?
9. If 90 American teenage girls are randomly selected, what is the probability that their mean expenses is
between $30.5 and $32?
10. If 5 of these teenage girls are asked on their expenses on shopping per month, what is the probability
that their mean expenses is between $28.7 and $35.8?

VII. EVALUATION

A. Compute the z-value for each; assume that each population is normally distributed.
1. µ = 100, 𝝈 = 2, 𝒙
̅ = 100.5, and n = 80
2. µ = 62, 𝝈 = 6, 𝒙̅ = 59, and n = 30
3. µ = 140, 𝝈 = 14, 𝒙
̅ = 145, and n = 12
4. µ = 46, 𝝈 = 9, 𝒙
𝟐 ̅ = 45.5, and n = 20
5. µ = 245, 𝝈 = 20, 𝒙
𝟐
̅ = 248, and n = 25
6. µ = 45, 𝒔 = 6 𝒙
̅ = 46.5, and n = 55
7. µ = 12.5, 𝒔 = 5, 𝒙
̅ = 11.8, and n = 50
8. µ = 156 𝒔 = 18.5, 𝒙̅ = 159, and n = 40
9. µ = 87, 𝒔𝟐 = 30, 𝒙̅ = 86.2 and n = 33
10. µ = 75, 𝒔𝟐 = 18, 𝒙 ̅ = 73.2, and n = 48

B. Compute the 𝒕- value for each:

1. Find the 𝒕- value when µ = 42, , 𝒙


̅ = 43, 𝒔 = 5 and n = 20
2. Find the 𝒕- value when µ = 18.5, , 𝒙
̅ = 19, 𝒔 = 2.5 and n = 16
3. Find the 𝒕- value when µ = 65.5, , 𝒙
̅ = 63, 𝒔 = 8.1 and n = 10
4. Find the 𝒕- value when µ = 200, , 𝒙
̅ = 197, 𝒔 = 10 and n = 14
5. Find the 𝒕- value when µ = 127, 𝒙̅ = 121, 𝒔 = 14.1 and n = 18
6. Find the 𝒕- value when µ = 67.2, 𝒙̅ = 68.5, 𝒔𝟐 = 4 and n = 11
7. Find the 𝒕- value when µ = 77, 𝒙̅ = 85, 𝒔𝟐 = 12 and n = 22
8. Find the 𝒕- value when µ = 14.6, 𝒙̅ = 11.7 𝒔𝟐 = 27 and n = 8
9. Find the 𝒕- value when µ = 4.7, 𝒙̅ = 4.5, 𝒔𝟐 = 0.81 and n = 17
10. Find the 𝒕- value when µ = 9.25, 𝒙̅ = 10.12, 𝒔𝟐 = 1.4 and n = 15

ALL RIGHTS RESERVED: IMMACULADA CONCEPCION COLLEGE (STATISTICS AND PROBABILITY) PAGE 31 OF 48
Immaculada Concepcion College
Of Soldier’s Hills Caloocan City, Inc.
Soldier’s Hills III Subd. Brgy. 180, Tala, North Caloocan City

Week # ____8_______

LESSON 8: Estimating Population Mean (µ) When is known or when n ≥ 30


Estimating Population Mean (µ) When is unknown and when n < 30
Estimating Population Proportion (p) for Large Sample Size

I. INTRODUCTION
Since population is usually large, describing it (determining its parameters) is very difficult. This is one of the
reasons why there is Statistics. In this lesson, we will discuss estimating the population parameter.

II. LESSON OBJECTIVES


a. Illustrate point and interval estimations.
b. Distinguish between point and interval estimation, proportion, inferences in real life problems in
different discipline.
c. Identify point estimator for the population mean.

III. PRE- ASSESSMENT


Compute the margin of error of µ given the level of confidence, population standard deviation , and sample
size n.
1. Confidence level = 90%, = 3, n = 30
2. Confidence level = 95%, = 8, n = 20
3. Confidence level = 97%, = 12, n = 35
4. Confidence level = 99%, = 21, n = 50
5. Confidence level = 98%, = 15, n = 55
6. Confidence level = 97%, = 8.2, n = 40
7. Confidence level = 90%, = 14.4, n = 36
8. Confidence level = 95%, = 8.8, n = 144
9. Confidence level = 95%, = 12.8, n = 27
10. Confidence level = 98%, = 9, n = 80

IV. LESSON CONTENT


1. Estimating Population Mean (µ) When is known or when n ≥ 30
Estimation can be either point estimation or interval estimation. It is usual that point estimation is not utilized
since sample statistics (estimators) fluctuate from sample to sample. This makes interval estimation very useful.
Definition
Statistical inference is making conclusion or generalization about population based on the study of samples.

Point estimate is a single value that estimates the population parameter, such as 𝒙
̅ as estimate for µ or s as
estimate for .

Interval estimate sometimes called confidence interval, is a range or interval (with lower and upper limits)
used to estimate the population parameter. It is usually in the form a < 𝜃 > b, which tells that the estimated
parameter (𝜃) is between two values (a and b) at a certain level of confidence.
When the population variance or standard deviation is known, or when n ≥ 30 (by central limit theorem), the
formula below can be used as an interval estimate of population mean (µ) at a certain degree of confidence
(a):
𝑜 𝑜
𝑥̅ – za/2 ( ) < µ < 𝑥̅ + za/2 ( )
√𝑛 √𝑛
Where 𝑥̅ = sample mean,
= population standard deviation
n = sample size
za/2 = z value that leaves an area of a/2.
The values of za/2 are listed below with the usual confidence level used in estimating population mean.

Confidence level or 90% 95% 97% 98% 99%


(1-a) 100%
za/2 1.64 1.96 2.17 2.33 2.58
𝑜
From the formula, za/2 ( ) is called margin of error.
√𝑛
Margin of Error refers to the maximum acceptable difference (determined by a) between the observed sample
statistic (mean or proportion) and the true population parameter (mean or proportion).
ALL RIGHTS RESERVED: IMMACULADA CONCEPCION COLLEGE (STATISTICS AND PROBABILITY) PAGE 32 OF 48
Immaculada Concepcion College
Of Soldier’s Hills Caloocan City, Inc.
Soldier’s Hills III Subd. Brgy. 180, Tala, North Caloocan City

𝑍𝑎/2𝜎 2
Formula: n( ) where E = margin of error
𝐸
= population standard deviation

Example:
Compute the margin of error of the 95% confidence interval estimate of µ when = 10, n = 25.

Answer:
From the table, for 95% level, za/2 = 1.96, thus, the margin of error is:
𝜎 10
za/2 ( ) = 1.96 ( ) = 1.96 (2) = 3.92
√𝑛 √25

2. Estimating Population Mean (µ) When is unknown and when n < 30

We have established earlier than when n < 30 (small sample size), the Central limit theorem cannot be
applied, and thus, if population standard deviation 𝜎 is unknown for small sample size, the sample standard
deviation s cannot take its place, therefore, the interval estimate using the z-table cannot be used. For this
case, the t-distribution is used to make an interval estimate. The formula becomes:
𝑠 𝑠
𝑥̅ – 𝑡a/2 ( ) < 𝜇 < 𝑥̅ + 𝑡a/2 ( )
√𝑛 √𝑛
Where 𝑥̅ = sample mean, s = sample standard deviation, n = sample size
𝑡a/2 = 𝑡-value with n -1 degrees of freedom that leaves an area of a/2.
𝑠
From the formula 𝑡 a/2 ( ) is called margin of error.
√𝑛

Example:
Compute the 95% confidence interval estimate of µ given the following: s = 9, n = 12, and 𝑥̅ = 27
Answer:
From the given, df = 12-1 = 11, since it is 95% confidence level, a = 5%, thus 𝑡 a/2 = 𝑡 0.025. From the t-table, for df =
11, 𝑡 0.025 = 2.201
𝑠 𝑠
𝑥̅ – 𝑡a/2 ( ) < 𝜇 < 𝑥̅ + 𝑡a/2 ( )
√𝑛 √𝑛
9 9
27 – 2.201 ( ) < 𝜇 < 27 + 2.201 ( )
√27 √27
27 – 2.201(1.73) < 𝜇 < 27+ 2.201 (1.73)

23.19 < 𝜇 < 30.81


3. Estimating Population Proportion (p) for Large Sample Size

Estimating population proportion is similar to estimating population mean. When the sample proportion 𝑝̂
(pronounced as p-hat) is computed from a large sample n, then the interval estimate of the population
proportion p at certain a can be computed as:
𝑝𝑞 𝑝𝑞
𝑝̂ – za/2 √ 𝑛 < 𝑝 < 𝑝̂ + za/2√ 𝑛
Where 𝑝 = sample proportion, q = 1- 𝑝̂ , n= sample size, za/2 = z value that leaves an area of a/2

Example:
Compute the 90% confidence interval estimate of p given the following 𝑝̂ =0.65 and n = 50.

Answer:
Since p = 0.65, then q = 0.35. From the table, for 90% level, za/2 = 1.64
𝑝𝑞 𝑝𝑞
𝑝̂ – za/2 √ < 𝑝 < 𝑝̂ + za/2√
𝑛 𝑛

(0.65)(0.35) (0.65)(0.35)
0.65 -1.64 √ < 𝑝 < 0.65 + 1.64 √
50 50

ALL RIGHTS RESERVED: IMMACULADA CONCEPCION COLLEGE (STATISTICS AND PROBABILITY) PAGE 33 OF 48
Immaculada Concepcion College
Of Soldier’s Hills Caloocan City, Inc.
Soldier’s Hills III Subd. Brgy. 180, Tala, North Caloocan City

0.65 -1.64 (0.07) < 𝑝 < 0.65 + 1.64 (0.07)


0.65 – 0.11 < 𝑝 < 0.65 + 0.11
0.54 < 𝑝 < 0.76

V. PRACTICE
Compute the margin of error of µ given the level of confidence, sample standard deviation s, and sample size
n.
1. Confidence level = 90%, s = 3, n = 10
2. Confidence level = 95%, s = 8, n = 20
3. Confidence level = 98%, s = 12, n = 12
4. Confidence level = 99%, s = 21, n = 15
5. Confidence level = 98%, s = 15, n = 21
6. Confidence level = 90%, s = 8.2, n = 8
7. Confidence level = 90%, s = 14.4, n = 26
8. Confidence level = 95%, s = 8.8, n = 17
9. Confidence level = 95%, s = 12.8, n = 13
10. Confidence level = 98%, s = 9, n = 19

VI. ENRICHMENT
Solve the following problems:
1. A coffee machine is regulated so that the amount it dispenses is normally distributed. If a random
sample of 21 cups had an average of 8 ounces with standard deviation of 0.5 ounces. Construct a 95%
confidence interval estimate for the average amount of all cups of coffee dispensed by this machine.
2. The average weight of 15 adult Dagupan bangus is 750 grams with standard deviation of 80 grams.
Construct a 98% confidence interval estimate of the average weight of all adult Dagupan bangus.
3. Ten (10) “Taklobo” or giant clams have an average of 45 inches across its shell with standard deviation
of 4 inches. Construct a 95% confidence interval estimate of the average length across shells of all giant
clams.
4. In a study of a personnel services analytics, 20 managers were found to spend a mean of 2.5 hours
each day on paper works with a standard deviation of 1.2 hours. Construct a 90% confidence interval
estimate of the average time spent on paper works by all managers.
5. A study was conducted to test a new variety of rice. A sample of 5 plots showed an average yield of
per square meter as recorded below. Construct a 95% confidence interval estimate of the average
yield per square meter of the new variety of rice.
Plot 1 2 3 4 5
Yield (kg/m2 2 2.5 3 1.6 2.4
(Hint: Compute first the sample mean and the sample standard deviation)

VII. EVALUATION
A. Compute the interval estimate of µ given the confidence level, sample mean 𝑥̅ , population standard
deviation , and sample size n.
1. Confidence level = 90%, 𝑥̅ = 42, = 10, and n = 40
2. Confidence level = 98%, 𝑥̅ = 21, = 15, and n = 50
3. Confidence level = 95%, 𝑥̅ = 142, = 9, and n = 25
4. Confidence level = 99%, 𝑥̅ = 28, = 12, and n = 60
5. Confidence level = 97%, 𝑥̅ = 45, = 8, and n = 140
B. Compute the interval estimate of µ given the confidence level, sample mean 𝑥̅ , sample standard
deviation s, and sample size n.
1. Confidence level = 90%, 𝑥 ̅ = 42, s = 10, and n = 20
2. Confidence level = 98%, 𝑥 ̅ = 21, s = 15, and n = 10
3. Confidence level = 95%, 𝑥 ̅ = 142, s = 9, and n = 15
4. Confidence level = 99%, 𝑥 ̅ = 28, s = 12, and n = 11
5. Confidence level = 90%, 𝑥 ̅ = 45, s = 8, and n = 16
C. Compute the interval estimate for p given the level of confidence, sample proportion p , and sample
size n.
1. Confidence level = 95%, p = 0.3, n = 30
2. Confidence level = 90%, p = 0.8, n = 50
3. Confidence level = 98%, p = 0.6, n = 35
4. Confidence level = 99%, p = 0.5, n = 70
5. Confidence level = 90%, p = 0.15, n = 55

ALL RIGHTS RESERVED: IMMACULADA CONCEPCION COLLEGE (STATISTICS AND PROBABILITY) PAGE 34 OF 48
Immaculada Concepcion College
Of Soldier’s Hills Caloocan City, Inc.
Soldier’s Hills III Subd. Brgy. 180, Tala, North Caloocan City

Week # ____1____

LESSON 1: The Basics of Hypothesis Testing


I. INTRODUCTION
In the previous lesson, we have discussed estimation as a way to describe the population (which is usually
large) by means of creating a point or an interval estimate. In this lesson, we shall use another method of
describing population which is called hypothesis testing.

II. LESSON OBJECTIVES


a. Demonstrate understanding of key concepts and elements of hypothesis formulation and testing.
b. Appreciate the importance of testing as guide for drawing conclusions.
c. Perform the rule of hypothesis testing in real-life problems.

III. PRE- ASSESSMENT


For each of the following, determine whether the hypothesis is null or alternative hypothesis.
______1. There is significant difference between the performance of male and female students in Mathematics.
______2. The study hours of the students increased significantly.
______3. There is no significant decrease in the crime rates in remote areas.
______4. There is significant difference in the salaries of teachers according to their ranks.
______5. The monthly rentals of apartment in Manila and Makati are equal.
______6. The average score of the players in five basketball teams are not the same.
______7. The cost of living in urban area is significantly higher than the cost of living in the rural area.
______8. The reaction of the worker about the collective bargaining agreement (in favor or not in favor) is
related on their status (contractual or permanent).
______9. The introduction of new packaging significantly increased the revenue of the company.
______10. The decision of the students whether to join or not to join in the activity is dependent of their gender.

IV. LESSON CONTENT


Suppose a study is concerned with the average number of minutes a senior high school student reads his
previous lesson in Mathematics. By merely knowing the purpose of the study, a researcher already has a claim,
guess, or conjecture. This claim or guess is called hypothesis (plural form is hypotheses). For example before
conceptualizing this study, a report showed that elementary pupils read their math books for an average of 25
minutes. Thus, the researcher of the study mentioned above may have the following in his mind as claims or
hypotheses.
a. Senior high school students read their previous lesson in Math for 25 minutes per day.
b. Senior high school students read their previous lesson in Math for less than 25 minutes per day.
c. Senior high school students read their previous lesson in Math for more than 25 minutes per day.

The hypotheses above can be written as:


a) µ = 25
b) µ < 25
c) µ > 25 or can be combined as µ ≠ 25

Hypotheses or claims can be classified as Null Hypothesis (H 0) or Alternative Hypothesis (Ha). Hypothesis (a) is a
Null Hypothesis, while hypotheses (b) and (c) are called Alternative Hypotheses.
Definition
Hypothesis Testing is a process of gathering evidences to either support or rebut a claim or conjecture,
known as hypothesis.
Null Hypothesis (H0) is a claim that denotes “absence” such as absence of difference, absence of
relationship, or equality to a certain value, and the like.
It usually comes with “=,≥, 𝑜𝑟 ≤ " when written in symbol.
Alternative Hypothesis (Ha) is a claim that denotes “presence” such as presence of difference, presence of
relationship, or inequality to a certain value and the like. It usually comes with "≠, " <, 𝑜𝑟 > " when written in
symbol.

The “rejection” of H0 will lead to the “acceptance” of Ha.

Note: The Null Hypothesis is the TESTABLE hypothesis.


A hypothesis that uses < or > is called directional (one-tailed) while a hypothesis that uses ≠ is called
non-directional (two-tailed)

After testing a hypothesis, of course a decision shall be made as bases for a conclusion that is to reject or not to
reject the null hypothesis (testable hypothesis). In making a decision, four possible results can be made, two
right decisions and two wrong decisions.
ALL RIGHTS RESERVED: IMMACULADA CONCEPCION COLLEGE (STATISTICS AND PROBABILITY) PAGE 35 OF 48
Immaculada Concepcion College
Of Soldier’s Hills Caloocan City, Inc.
Soldier’s Hills III Subd. Brgy. 180, Tala, North Caloocan City

Right Decisions:
✓ rejecting a fall null hypothesis
✓ not rejecting a true null hypothesis.
Wrong Decisions:
x rejecting a true null hypothesis (Type I Error)
x not rejecting a false null hypothesis (Type II Error)

As shown above, two possible errors could be committed. The probability of committing a Type I error is
represented by 𝛼 (Greek letter alpha) while the probability of committing a Type II error is denoted as 𝛽 (Greek
letter beta). However, in testing hypothesis, only the probability of committing a Type I error a is used. The usual
acceptable values of a used are 0.05 and 0.01. If a=0.05, then the probability of rejecting a true null hypothesis
is 5%, which means that the probability of not rejecting a true null hypothesis is 95% (this is NOT the value of 𝛽)

STEPS IN TESTING HYPOTHESIS (Critical Value Method)


1. Determine Null Hypothesis (H0) and Alternative Hypothesis (Ha).
2. Identify the Statistical Test to be used, the value of a, and the critical value of the test statistic.
3. Computation
4. Decision (reject or not to reject H0)
5. Conclusion (in non-technical terms)

V. PRACTICE
A social worker wants to test (at a = 0.05) whether the average body mass index (BMI) of the pupils under
feeding program is different from 8.2 kg.
a. State the null and alternative hypothesis in words.
b. State the null and alternative hypothesis in symbols.
c. What is the probability of committing Type I error?
d. State the conclusion when H0 is rejected.
e. State the conclusion when H0 is not rejected.

VI. ENRICHMENT
In each of the situations, answer the following.
a. State the null and alternative hypothesis in words.
b. State the null and alternative hypothesis in symbols.
c. What is the probability of committing Type I error?
d. State the conclusion when H0 is rejected.
e. State the conclusion when H0 is not rejected.

1. A college dean claims that a bachelor’s degree could be earned in an average of five years.
Test the claim using 95% confidence level.
a) H0: _______________________________________________
Ha: _______________________________________________
b) H0: __________ Ha:________________________
c) ___________________________________________________
d) ___________________________________________________
e) ___________________________________________________
2. An FDA officer claims that Pharma XYZ’s new caplet drug contains less than 300mg of paracetamol.
Test the claim using 99% confidence level.
a) H0: _______________________________________________
Ha: _______________________________________________
b) H0: __________ Ha:________________________
c) ___________________________________________________
d) ___________________________________________________
e) ___________________________________________________
3. The manufacturer of cigarette claims that the average nicotine content per stick is 2.1 mg.
Test the claim using 90% confidence level.
a) H0: _______________________________________________
Ha: _______________________________________________
b) H0: __________ Ha:________________________
c) ___________________________________________________
d) ___________________________________________________
e) ___________________________________________________

ALL RIGHTS RESERVED: IMMACULADA CONCEPCION COLLEGE (STATISTICS AND PROBABILITY) PAGE 36 OF 48
Immaculada Concepcion College
Of Soldier’s Hills Caloocan City, Inc.
Soldier’s Hills III Subd. Brgy. 180, Tala, North Caloocan City

4. A real estate agent claims that 60% of all condominium units built today are studio-type. Test the claim
using 98% confidence level.
a) H0: _______________________________________________
Ha: _______________________________________________
b) H0: __________ Ha:________________________
c) ___________________________________________________
d) ___________________________________________________
e) ___________________________________________________
5. The PQR Chamber of Commerce claim that their mean annual income is US$60,000. Test the claim using
95% confidence level.
a) H0: _______________________________________________
Ha: _______________________________________________
b) H0: __________ Ha:________________________
c) ___________________________________________________
d) ___________________________________________________
e) ___________________________________________________

VII. EVALUATION

Write the Null Hypothesis (H0) or the Alternative Hypothesis (Ha) of the following:

1. H0: µ = 10.2 Ha: ______________

2. H0: __________ Ha: µ ≠ 32.6

3. H0: __________ Ha: µ > 45

4. H0: µ ≥ 101.6 Ha: ______________

5. H0: __________ Ha: 𝑝 ≠ 0.87

6. H0: The average IQ of Grade 10 students is 110

Ha: ______________________________________________________________

7. H0: The proportion of male rats in the population of rats is 45%

Ha: ______________________________________________________________

8. H0: ______________________________________________________________

Ha: The mean height of Asian women is different from 61 inches.

9. H0: ______________________________________________________________

Ha: The proportion of obese children ages 3-10 is higher than 40%.

10. H0: ______________________________________________________________

Ha: The mean amount of dispensed coffee of a new vending machine is greater than 300ml.

ALL RIGHTS RESERVED: IMMACULADA CONCEPCION COLLEGE (STATISTICS AND PROBABILITY) PAGE 37 OF 48
Immaculada Concepcion College
Of Soldier’s Hills Caloocan City, Inc.
Soldier’s Hills III Subd. Brgy. 180, Tala, North Caloocan City

Week # _____2_______

LESSON 2: Testing Hypothesis about Population Mean

I. INTRODUCTION
As mentioned earlier, the hypothesis or claim about population mean or population proportion could be tested
using the five-step hypothesis testing procedure. This text only includes the discussion of two basic tests of
hypothesis about population mean using only single sample.

II. LESSON OBJECTIVES


a. Understand the concept of Test statistic.
b. Illustrate understanding of the assumptions of z-test
c. Illustrate understanding of the assumptions of t-test
III. PRE- ASSESSMENT
Determined the decision for each of the following given the computed and critical value of the z or t:
a. zcomputed = 1.82 zcritical = 1.96
b. zcomputed = 2.54 zcritical = 2.33
c. tcomputed = 2.02 tcritical = 1.771
d. tcomputed = 2.24 tcritical = 2.552

IV. LESSON CONTENT

Z-TEST ( : known, or n ≥ 30)


One of the most common test for population mean is called the z-test which uses the properties of z-
distribution or normal distribution when the population standard deviation ( ) or variance ( 2) is known. This is
also used when the sample size n is greater than or equal to 30 (n ≥ 30) by virtue of the central limit theorem.
The z-test formula is:

𝒙̅ − 𝝁𝟎
Z=
𝝈/√𝒏

Where: 𝝁𝟎 = claimed population mean = population standard deviation (can be replaced by n ≥ 30)
̅ = sample mean, n = sample size
𝒙

Decision rule (for critical value method):


➢ If zcomputed ≥ zcritical REJECT H0
➢ If zcomputed < zcritical DO NOT REJECT H0

Note:
NEGATIVE sign of the computed z is disregarded when comparing it to the critical value of z if the hypothesis is
non-directional.

T-TEST ( : unknown, or n < 30)


If the sample size is small (n<30) and if the population standard deviation or variance is unknown, the z-
test cannot be used. For a special case where the population from which the samples are taken is known to be
normally distributed, the t-test can be used to test a claim or hypothesis about population mean.
The t-test formula is:

𝒙̅ − 𝝁𝟎
𝖙=
𝒔/√𝒏

Where: 𝝁𝟎 = claimed population mean s = sample standard deviation


̅ = sample mean, n = sample size
𝒙

Decision rule (for critical value method):


➢ If 𝖙computed ≥ 𝖙critical REJECT H0
➢ If 𝖙computed < 𝖙critical DO NOT REJECT H0

ALL RIGHTS RESERVED: IMMACULADA CONCEPCION COLLEGE (STATISTICS AND PROBABILITY) PAGE 38 OF 48
Immaculada Concepcion College
Of Soldier’s Hills Caloocan City, Inc.
Soldier’s Hills III Subd. Brgy. 180, Tala, North Caloocan City

Note:
NEGATIVE sign of the computed t is disregarded when comparing it to the critical value of t if the hypothesis is
non-directional.

V. PRACTICE
Determined the decision for each of the following given the computed and critical value if the z or t:
1. zcomputed = 1.02 zcritical = 1.64 Decision: _______________________
2. zcomputed = 2.15 zcritical = 1.96 Decision: _______________________
3. zcomputed = 2.24 zcritical = 2.33 Decision: _______________________
4. zcomputed = 0.16 zcritical = 2.17 Decision: _______________________
5. zcomputed = 3.25 zcritical = 1.28 Decision: _______________________
6. tcomputed = 5.26 tcritical = 2.55 Decision: _______________________
7. tcomputed = 1.97 tcritical = 2.12 Decision: _______________________
8. tcomputed = 0.26 tcritical = 1.53 Decision: _______________________
9. tcomputed = 1.19 tcritical = 1.31 Decision: _______________________
10. tcomputed = 2.89 tcritical = 1.86 Decision: _______________________

VI. ENRICHMENT
A printer manufacturing company claims that its new ink-efficient printer can print an average of 1500 pages of
word documents with standard deviation of 60. Thirty five (35) of these printers showed a mean of 1475 pages.
Does this support the company’s claim? Use 95% confidence level.

Using the five-step hypothesis testing procedure:


1. Null Hypothesis (H0) and Alternative Hypothesis (Ha)
H0: µ = 1500
Ha: µ ≠ 1500
2. Statistical Test = z-test (two-tailed)
a = 0.05 (since the confidence level is 95%)
zcritical = 1.96
𝒙̅ − 𝝁𝟎
3. Computation: Z =
𝝈/√𝒏
𝟏.𝟒𝟕𝟓−𝟏𝟓𝟎𝟎
=
𝟔𝟎/√𝟑𝟓

−𝟐𝟓
= z = 2.47 (NEGATIVE SIGN could be disregarded since the test is two-tailed)
𝟏𝟎.𝟏𝟒
4. Decision (reject or not to reject H0)
Since the computed z (disregarding negative sign) is greater than the critical value of z. H0 is REJECTED.
5. Conclusion:
There is a sufficient evidence to deny the company’s claim.

VII. EVALUATION

Solve the following using the five-step hypothesis testing procedure.


1. A cosmetics company claim that teenage ladies consume average of P150 monthly for cosmetic
products. In order to verify this claim, a researcher conducted a survey to 40 teenage ladies and found
that their mean monthly expense for cosmetic products is P146 with standard deviation of P12. Using
a=0.05, test whether the companies claim is realistic or not?
2. An auto batter company claims that their batteries mean life is 50 months. In order to check this claim, a
DTI researcher took a random sample of 18 of these batteries and found that the mean life is 48.8
months with standard deviation of 7 months. Assume that battery life follows a normal distribution, test
with 90% confidence whether the companies claim is different form the true mean.
3. In a certain city, a school administrator hypothesized that students enroll in schools within 5km from their
homes. To check this claim, you asked 30 students from the said city and you found that the mean
distance between their schools and homes is 5.3 km with standard deviation of 0.2 km. Is the true mean
greater than the hypothesized mean? Use a = 0.05.

ALL RIGHTS RESERVED: IMMACULADA CONCEPCION COLLEGE (STATISTICS AND PROBABILITY) PAGE 39 OF 48
Immaculada Concepcion College
Of Soldier’s Hills Caloocan City, Inc.
Soldier’s Hills III Subd. Brgy. 180, Tala, North Caloocan City

Week # _____3_____

LESSON 3: Testing Hypothesis about Proportion

I. INTRODUCTION
We have mentioned in the previous lesson that the population proportion can be estimated only for large
sample size (n ≥ 30). The same is true in testing a claim or hypothesis about the population proportion (p).

II. LESSON OBJECTIVES


a. Demonstrate understanding of key concepts of test hypothesis in population proportion.
b. Appreciate the importance of testing hypothesis involving population proportion.
c. Perform appropriate test of hypotheses involving population proportion in real-life problems.

III. PRE- ASSESSMENT


Compute the z for each given the claim (p), the observed proportion (𝑝̂ ), and the sample size (n).
a) p=0.3, 𝑝̂ = 0.4, n = 60
b) p=0.8, ̂𝑝 = 0.72, n = 30
c) p=0.66, 𝑝̂ = 0.61, n = 40
d) p=0.12, ̂𝑝 = 0.13, n = 120

IV. LESSON CONTENT


A researcher who is studying on the rapid growth of rat population wants to determine the proportion of
female rats in certain region, then he doesn’t need to catch every rat he sees and record its gender. He only
needs sufficient sample from which he will make inference about proportion of female rats.

In the example, the researcher may initially believe that 50% of the rat population are female. Suppose he has
set-up traps to collect a number of rats in different parts of the region and out of the 50 rats he has collected,
23 are female. Would this support his initial belief?

To test a claim about population proportion, we use the z-test for population proportion.
The formula is:
̂−𝒑
𝒑
Z=
√𝒑𝒒/𝒏

Where: p = claimed/hypothesized proportion, ̂𝑝 = sample proportion.


𝑞̂ = 1 - ̂𝑝, n = sample size

As in the use of the z-test for means, the decision rule below is used:
➢ If zcomputed ≥ zcritical REJECT H0
➢ If zcomputed < zcritical DO NOT REJECT H0

V. PRACTICE
From the example above, the researcher wants to test his belief that 50% or 0.5 of the population of rats is
female. From his collected samples, 23 out of 50 are female. Would this support his claim? Use a = 0.05

Using the five-step hypothesis testing procedure.

1. Null Hypothesis (H0) and Alternative Hypothesis (Ha)

H0: p = ______

Ha: p ≠ _______

2. Statistical Test = z-test (two-tailed)

a = 0.05

zcritical = 1.96

ALL RIGHTS RESERVED: IMMACULADA CONCEPCION COLLEGE (STATISTICS AND PROBABILITY) PAGE 40 OF 48
Immaculada Concepcion College
Of Soldier’s Hills Caloocan City, Inc.
Soldier’s Hills III Subd. Brgy. 180, Tala, North Caloocan City

3. Computation:
̂ = 23/50 = 0.46 thus, q = 0.54
From the problem, 𝒑

̂−𝒑
𝒑
Z=
√𝒑𝒒/𝒏

𝟎.𝟒𝟔−𝟎.𝟓
=
√(𝟎.𝟓)(𝟎.𝟓)(𝟎.𝟓)

?
= z = _______(NEGATIVE SIGN could be disregarded since the test is two-tailed)
?
4. Decision (reject or not to reject H0)
Since the computed z (disregarding negative sign) is (less than? or greater than?) the critical value of z.
H0 is REJECTED.
5. Conclusion:
There is (a sufficient? or no sufficient?) evidence to deny the researcher’s claim. Thus, 50% of the rat
population are female.

VI. ENRICHMENT

Solve the following using the five-step hypothesis testing procedure.


1. A movie and regulations personnel claims that 30% of all movies made are Rated G. If 13 out of 40
randomly selected movies are Rated G, is the personnel’s claim supported? Use 95% confidence level.
2. A television channel claims that 40% of the population who watch TV patronized their TV channel. After
collecting 70 samples, they found that 30 of them watch the said TV channel. Use a=0.01 to test their
claim.
3. A medical technologist claims that 25% of the chain-smokers are teenagers. After making a survey of 42
chain smokers, he found that 8 of them are teenagers. Using 95% confidence level, test whether his
claim is supported or not.
4. A college dean reported that 80% of all students majoring in math are males. If 62 out of 80 randomly
selected college students who major in math are males, is his claim supported? Use 99% confidence
level.
5. A teacher believes that less than 20% of the students like Mathematics. If 13 out of the 60 randomly
selected students like Mathematics, is the teacher’s claim valid? Use 90% confidence level.

VII. EVALUATION

Compute the z for each given the claim (p), the observed proportion (𝑝̂ ), and the sample size (n).

1. p=0.2, 𝑝̂ = 0.18, n = 50

2. p=0.8, ̂𝑝 = 0.72, n = 160

3. p=0.66, 𝑝̂ = 0.61, n = 40

4. p=0.12, ̂𝑝 = 0.13, n = 120

5. p=0.7, ̂𝑝 = 0.68, n = 30

6. p=0.85, ̂𝑝 = 0.88, n = 45

7. p=0.6, ̂𝑝 = 0.65, n = 36

8. p=0.12, ̂𝑝 = 0.1, n = 49

9. p=0.53, ̂𝑝 = 0.54, n = 100

10. p=0.28, ̂𝑝 = 0.27, n = 32

ALL RIGHTS RESERVED: IMMACULADA CONCEPCION COLLEGE (STATISTICS AND PROBABILITY) PAGE 41 OF 48
Immaculada Concepcion College
Of Soldier’s Hills Caloocan City, Inc.
Soldier’s Hills III Subd. Brgy. 180, Tala, North Caloocan City

Week # ____4______

LESSON 4: Bivariate and Scatter plot

I. INTRODUCTION
A college mathematics instructor wants to analyze the grades of his 30 students in English and Mathematics. He
asked the students to get a piece of paper and write on it their grades in these two subjects, as well as the
school where they graduated from. The teacher tallied the data collected and set up three tables. He found
out that 10 students graduated from public schools, 10 from private sectarian schools, and 10 from private non-
sectarian schools.
How do we determine the relationship between Math and English grades for each group of students?

II. LESSON OBJECTIVES


a. Draw the scatter diagram of bivariate data.
b. Determine the relationship of bivariate data using scatterplot.

III. PRE- ASSESSMENT


For each of the following cases, state whether there is positive correlation, negative correlation or no
correlation exists.
1. Total family income and family expenses
2. Mathematics grades and height of students
3. Number of absences and grades
4. Company sales and advertising expenses
5. Number of policemen and number of crimes recorded
6. Number of pages and price of the book
7. Weight and singing ability
8. Educational attainment and salary
9. School allowance and academic performance
10. Size of the family and expenses.

IV. LESSON CONTENT


Since the mathematics instructor wants to determine the relationship between the English and Mathematics
grades, we say that he is analyzing bivariate data because the data are coming from two variables.
Bivariate data deals with two variables (either qualitative or quantitative) that can be explored to establish
relationships. One of the best methods for graphing bivariate data that are quantitative is through scatterplot.
Key Concepts

Bivariate data – data that comes from two variables


Examples of bivariate data that are quantitative:
1. IQ and Academic performance
2. Student population and teacher salary
3. Family size and food consumption
Examples of bivariate data where at least one of the variables is
qualitative:
1. Average salary of teachers according to rank and type of
school
2. Number of teachers according to gender and type of school
3. Type of residence and number of family members in each
household.
Scatter plot – (scatter diagram) the graphical representation of
bivariate data.
Private Non-
Public Private Sectarian Sectarian
Student English Math Student English Math Student English
1 78 76 1 77 95 1 75
2 79 81 2 79 91 2 79
3 80 79 3 81 92 3 80
4 82 84 4 83 88 4 83
5 85 83 5 85 87 5 85
6 87 86 6 88 86 6 88
7 88 89 7 89 80 7 90
8 89 88 8 90 81 8 90
9 92 91 9 93 79 9 92
10 93 94 10 94 77 10 95
ALL RIGHTS RESERVED: IMMACULADA CONCEPCION COLLEGE (STATISTICS AND PROBABILITY) PAGE 42 OF 48
Immaculada Concepcion College
Of Soldier’s Hills Caloocan City, Inc.
Soldier’s Hills III Subd. Brgy. 180, Tala, North Caloocan City

Let us now determine the relationship between the English and Math grades fir each group using scatterplot.
Since Math and English grades are both quantitative, one variable (Math grade) is plotted along the horizontal
axis and the second variable (English grade) along the vertical axis.

From the scatterplot, we can say that there is positive correlation between Math and English grades of students
from the public schools, negative correlation between the math and English grades of students from the
private sectarian schools, and no correlation between the math and English grades of students from the private
non-sectarian schools. A perfect correlation happens when all the points lie on straight line.

V. PRACTICE
Write the statement showing the relationship between two variables.
1. 2.

3. 4.

VI. ENRICHMENT
A candy retailer repacked candies in different ways and recorded the number of packs sold based on the
numbers of candies per pack. He noticed that very few packs are sold when there are 100 candies in a pack,
so he offers discount per packs of 100 candies based on the number of packs to be bought.
No. of pieces per pack No. of packs sold No. of packs (100 pcs. Per pack) Discount per pack (in P)
10 30 3 P2
20 27 4 P4
30 24 5 P6
40 21 6 P8
50 18 7 P10
60 15 8 P12
70 12 9 P14
80 9 10 P16
90 6 11 P18
100 3 12 P20
Draw the scatterplot for the following and interpret.
1. Number of candies per pack and the number of packs sold.
2. Number of packs (100 pcs. Per pack) to be bought and discount to be given.

ALL RIGHTS RESERVED: IMMACULADA CONCEPCION COLLEGE (STATISTICS AND PROBABILITY) PAGE 43 OF 48
Immaculada Concepcion College
Of Soldier’s Hills Caloocan City, Inc.
Soldier’s Hills III Subd. Brgy. 180, Tala, North Caloocan City

VII. EVALUATION
Draw the scatterplot and write the statement showing the relationship between the bivariate data.

1. English and Science grades of five science club officers.

English grade 80 80 85 88 89

Science grade 80 90 83 80 89

2. Anxiety level and average grade of selected grade 11 students.

Anxiety level 2 3 5 7 8 9

GPA 95 92 91 88 86 87

3. Family size and savings of selected families.

Family size 3 5 6 7 8 9 10

Savings (P) 1700 1500 1250 1300 850 800 400

4. Communication skills and confidence level of 8 sales agents.

Communication skills 4 5 6 7 7 8 9 10

Confidence level 5 4 7 7 6 9 10 9

5. Weight and IQ of selected computer programmer.

Weight 45 48 50 51 53 54 57 60 63

IQ 110 120 105 125 130 121 125 105 115

6. Distance traveled and the amount of fuel.

Distance 0 50 100 150 200 250 300


(km)

Fuel in tank 80 73 67 61 52 46 37
(liters)

ALL RIGHTS RESERVED: IMMACULADA CONCEPCION COLLEGE (STATISTICS AND PROBABILITY) PAGE 44 OF 48
Immaculada Concepcion College
Of Soldier’s Hills Caloocan City, Inc.
Soldier’s Hills III Subd. Brgy. 180, Tala, North Caloocan City

Week # ____5_____

LESSON 5: Coefficient of Correlation

I. INTRODUCTION
In this lesson, you will learn how to determine the association between bivariate data using scatterplot,
compute and interpret the correlation coefficients and coefficient of determination.

II. LESSON OBJECTIVES


a. Compute the coefficient of correlation of bivariate data.
b. Determine the degree of association between bivariate data.
c. Identify dependent and independent variables from bivariate data.

III. PRE- ASSESSMENT


Answer the following:
1. Calculate the value of r for random variables X and Y using the following values:

X 11 17 26
Y 23 18 19

2. The random variable Y is converted to Z by the equation Z = (Y +10) + 3.

X 11 17 26
Z

3. Compute the value of r for Y and Z.

IV. LESSON CONTENT


Pearson’s Product-Moment correlation coefficient is used to measure the linear relationship between two
variables that are normally distributed. It is denoted by r.
To find the Pearson’s Product-Moment correlation coefficient, we do the following:
1. Given the data in tabular form, add three columns to fine XY, X2, and Y2
2. Find the values of the following:
∑ 𝑋, ∑ 𝑌, ∑ 𝑋𝑌 , ∑ 𝑋 2 , ∑ 𝑌 2
3. Find the coefficient of correlation (r) using the formula:

𝑁 (∑ 𝑋𝑌)−(∑ 𝑋)(∑ 𝑌)
r=
√[𝑁(∑ 𝑋 2 )−(∑ 𝑋)2][𝑁(∑ 𝑌 2 )−(∑ 𝑌)2]
To interpret the value of correlation coefficient, we can use the table below.
Value of r Interpretation
1.0 Perfect Positive Correlation
0.90 to 0.99 Very high positive correlation
0.70 to 0.89 High positive correlation
0.40 to 0.69 Moderate positive correlation
0.20 to 0.39 Small positive correlation
- 0.20 to 0.19 Very small; negligible
- 0.40 to - 0.21 Small negative correlation
- 0.70 to - 0.41 Moderate Negative Correlation
- 0.90 to - 0.71 High Negative Correlation
- 0.99 to - 0.91 Very high Negative Correlation
-1.0 Perfect Negative Correlation

V. PRACTICE

Advertising expenses (x) Sales XY X2 Y2


15 21 315 225 441
18 27 486 324 729
20 20 400 400 400
24 28 672 576 784
16 17 272 256 289
26 31 806 676 961
10 7 70 100 49
ALL RIGHTS RESERVED: IMMACULADA CONCEPCION COLLEGE (STATISTICS AND PROBABILITY) PAGE 45 OF 48
Immaculada Concepcion College
Of Soldier’s Hills Caloocan City, Inc.
Soldier’s Hills III Subd. Brgy. 180, Tala, North Caloocan City

16 14 224 256 196


11 14 154 121 196
38 34 1292 1444 1156
36 39 1404 1296 1521
32 31 992 1024 961
Total - 262 283 7087 6698 7683

From the table compute the following:


𝑁 (∑ 𝑋𝑌)−(∑ 𝑋)(∑ 𝑌)
r=
√[𝑁(∑ 𝑋 2 )−(∑ 𝑋)2][𝑁(∑ 𝑌 2 )−(∑ 𝑌)2]

VI. ENRICHMENT
The production manager at XYZ Company is interested in determining the nature of the relationship between
training and productivity. The following data were collected over a one-quarter period on 10 employees.
TRAINING LEVEL (HRS) PRODUCTIVITY (UNITS/HR)
18 124
14 110
26 155
9 119
17 137
6 100
22 146
10 123
18 117
12 120

1. Determine the correlation coefficient between these two variables.


2. Comment on the extent of the relationship.
3. Test if the correlation is significant at 1% level of significance.

VII. EVALUATION

The head of the production department of a RM electronic company wants to determine the relationship
between the number of workers who assemble the product and the number of units assembled per day.

No. of workers 10 12 14 16 13 20 18 17
No. of units produced. 120 180 220 224 176 320 270 275

1. Find the values of the following


a. ∑ 𝑋 =
b. ∑ 𝑌 =
c. ∑ 𝑋𝑌 =
d. ∑ 𝑋 2 =
e. ∑ 𝑌 2 =
f. 𝑟 =
2. How would you interpret the value of Pearson’s correlation coefficient?
3. Test whether the correlation is significant at 5% level of significance.
4. Complete the table:

x Y XY X2 Y2

ALL RIGHTS RESERVED: IMMACULADA CONCEPCION COLLEGE (STATISTICS AND PROBABILITY) PAGE 46 OF 48
Immaculada Concepcion College
Of Soldier’s Hills Caloocan City, Inc.
Soldier’s Hills III Subd. Brgy. 180, Tala, North Caloocan City

Week # ___6_____

LESSON 1: Regression Analysis

I. INTRODUCTION
This lesson will focus on Regression analysis which will help you determined the effect of the independent
variable to the dependent variable and allow you to create mathematical models that can be used for
prediction purposes.

II. LESSON OBJECTIVES


a. Formulate the least square regression line equation.
b. Draw the least square regression line on scatterplot.
c. Use the least square regression line and the least square regression equation to predict the value of
the dependent variable.

III. PRE- ASSESSMENT

1. Using the least square regression line, find the percentage of passing (to the nearest whole number)
when the class contains:
a. 30 students
b. 35 students
c. 40 students
d. 50 students
e. 55 students

IV. LESSON CONTENT

Coefficient of determination (r2) is used to determine how well the least square regression line fits the sample
data. It is very useful in assessing how much errors of prediction of the dependent variable (y) can be reduced
by the information provided by the independent variable (x)

To get the value of the coefficient of determination (r2), compute the value of the coefficient of correlation (r)
and square the result. Since the value of correlation coefficient is from -1 to 1, therefore the value of the
coefficient of determination is from 0 to 1.

Coefficient of determination (r2) =


𝑁 (∑ 𝑋𝑌)−(∑ 𝑋)(∑ 𝑌)
r =[ ]2
√[𝑁(∑ 𝑋 2 )−(∑ 𝑋)2][𝑁(∑ 𝑌 2 )−(∑ 𝑌)2]

Key Concepts
Coefficient of determination – used to determine how well the least square regression line fits the sample data

Least square regression equation – an equation that is used to predict the value of the dependent variable
based on the value of the independent variable.

Least square regression line – the graphical presentation of the least square regression equation and can be
used to determine the approximate value of the dependent variable based on the value of the independent
variable given in the scatterplot.

V. PRACTICE
The marketing manager of the a large supermarket chain would like to determine the effect of shelf space on
the sales of pet food. A random sample of 7 stores was selected.

STORE SHELF SPACE WEEKLY SALES


1 50 16
2 50 22
3 80 14
4 80 19
5 100 24
6 100 26
7 140
ALL RIGHTS RESERVED: IMMACULADA CONCEPCION COLLEGE (STATISTICS AND PROBABILITY) PAGE 47 OF 48
Immaculada Concepcion College
Of Soldier’s Hills Caloocan City, Inc.
Soldier’s Hills III Subd. Brgy. 180, Tala, North Caloocan City

1. Determine the slope and y-intercept


2. Formulate the least square regression equation.

VI. ENRICHMENT

STORE SHELF SPACE WEEKLY SALES


1 50 16
2 50 22
3 80 14
4 80 19
5 100 24
6 100 26
7 140

Using the same table:


1. Draw the scatterplot and the square regression equation line.
2. Using the least square regression line, determine the weekly sales when the shelf space is 200 cm.
3. Using the least square regression equation, predict the weekly sales when the shelf space is 215cm? 250
cm?
4. Compute the coefficient of determination and interpret the value.

VII. EVALUATION
The production manager at XYZ Company is interested in determining the nature of the relationship between
training and productivity. The following data were collected over a one-quarter period on 10 employees.
TRAINING LEVEL (HRS) PRODUCTIVITY (UNITS/HR)
18 124
14 110
26 155
9 119
17 137
6 100
22 146
10 123
18 117
12 120

Using the same table from previous lesson:

1. Compute the slope and y-intercept

2. Draw the scatterplot and the square regression equation line.

3. Using the least square regression line, determine the weekly sales when the shelf space is 200 cm.

4. Using the least square regression equation, predict the weekly sales when the shelf space is 215cm? 250

cm?

5. Compute the coefficient of determination and interpret the value.

ALL RIGHTS RESERVED: IMMACULADA CONCEPCION COLLEGE (STATISTICS AND PROBABILITY) PAGE 48 OF 48

You might also like