Lecture 5, 6, 7 - Business Statisitcs
Lecture 5, 6, 7 - Business Statisitcs
1
LECTURES 5, 6, 7_BUSINESS STATISTICS
example, the development team for a new product may assign a probability of 0.60 to the chance
of success for the product, while the president of the company may be less optimistic and assign
a probability of 0.30.
Two events are mutually exclusive if both cannot occur simultaneously. If they cannot occur
simultaneously then the events are independent.
A set of events is collectively exhaustive if one of the events must occur
Heads and tails in a coin toss are mutually exclusive events. The result of a coin toss cannot
simultaneously be a head and a tail. Heads and tails in a coin toss are also collectively
exhaustive events. One of them must occur. If heads does not occur, tails must occur.
RULES OF PROBABILITY
Conditional probability
Refers to the probability of an event given information about the occurrence of another event
B i.e. it is used to quantify how probabilities change in light of new information.
In some cases, there is additional knowledge that might affect the outcome of an experiment,
hence the need to alter the probability of an event of interest. A probability that reflects such
additional knowledge is called the conditional probability of the event.
The probability of A given B is:
𝑷(𝑨 𝒂𝒏𝒅 𝑩)
𝑷(𝑨|𝑩) = 𝒆𝒒 𝟏
𝑷(𝑩)
Similarly, The probability of B given A is:
𝑷(𝑨 𝒂𝒏𝒅 𝑩)
𝑷(𝑩|𝑨) = 𝒆𝒒 𝟐
𝑷(𝑨)
where P(A and B)=joint probability of A and B, P(A) marginal probability of B ,P(A)
marginal probability of B.
The | symbol is read “given” and the event after the | symbol represents information that is
known.
The above formula allows for derivation of the multiplication rule for mutually exclusive
events i.e.
𝑃(𝐴 𝑎𝑛𝑑 𝐵) = 𝑃(𝐴|𝐵) × 𝑃(𝐵) 𝑓𝑟𝑜𝑚 𝑒𝑞1
𝑃(𝐴 𝑎𝑛𝑑 𝐵) = 𝑃(𝐵|𝐴) × 𝑃(𝐴) 𝑓𝑟𝑜𝑚 𝑒𝑞 2
2
LECTURES 5, 6, 7_BUSINESS STATISTICS
BAYES’ THEOREM
Bayes’ theorem is developed from the definition of conditional probability.
Bayes’ theorem is used to revise previously calculated probabilities based on new
information i.e. it governs the likelihood that one event is based on the occurrence of some
other events.
The conditional probabilities 𝑃(𝐴|𝐵) is not the same as 𝑃(𝐵|𝐴).
Bayes’ Theorem is a way to convert probabilities of the form 𝑃(𝐵|𝐴) into probabilities of
the form 𝑃(𝐴|𝐵).
To find the probability of A given B
𝑷(𝑩|𝑨) × 𝑷(𝑨)
𝑷(𝑨|𝑩) =
𝑷(𝑩|𝑨) × 𝑷(𝑨) + 𝑷(𝑩|𝑨′ ) × 𝑷(𝑨′ )
𝑷(𝑩|𝑨) × 𝑷(𝑨)
𝑷(𝑨|𝑩) =
𝑷(𝑩|𝑨) × 𝑷(𝑨) + 𝑷(𝑩|𝒏𝒐𝒕 𝑨) × 𝑷(𝒏𝒐𝒕 𝑨)
The equation above is known as Bayes’ theorem and is a statement about the probability
of the event A, conditional upon B having occurred.
Example:
i) A company expects that there is a 5% probability that the economy will expand. Furthermore,
there is a 90% probability that the company’s revenue will rise if the economy expands. If the
economy does not expand, there is only 40% probability that company’s revenue will rise.
ii) Assignment: Kakuku Shoe Company use nails as one of their inputs. The company buys 60%
of the nails from company A, 15% from company B and the rest from company C. The nails
bought from Company A have a 12% breakage rate, Company B have a 8% breakage rate and
Company C have a 12% breakage rate.
a. If a nail is picked randomly, what is the probability that it was bought from
company A?
b. If a selected nail breaks, what is the probability it was bought from company A?
3
LECTURES 5, 6, 7_BUSINESS STATISTICS
4
LECTURES 5, 6, 7_BUSINESS STATISTICS
Various indices can also be distinguished on the basis of the number of commodities that go
into the construction of an index. Indices constructed for individual commodities or variable
5
LECTURES 5, 6, 7_BUSINESS STATISTICS
are termed as simple index numbers. Those constructed for a group of commodities or
variables are known as aggregative (or composite) index numbers.
Example 1
Given the following price-quantity data of fish, with price quoted in Ksh per kg and production
in Kgs.
a) The price index for each year taking price of 1980 as base
Solution:
Given such an index, it is easy to find the percent by which the price/quantity may have changed
in a given period as compared to the base period.
6
LECTURES 5, 6, 7_BUSINESS STATISTICS
For example, observing the index computed in the above example 1, one can firmly say that the
output of fish was 30 per cent more in 1984 as compared to 1980.
It may also be noted that given the simple price/quantity for the base year and the index for the
period i = 1, 2, 3, …;the actual price for the period i = 1, 2, 3, …may easily be obtained as:
𝑝𝑜𝑖
𝑝𝑖 = 𝑝𝑜 ( )
100
120
𝑝𝑖 = 15 × ( ) = 18
100
4.2. Composite index numbers
When considering price/quantity changes in several commodities a composite index number is
used.
An aggregate price index is developed for the specific purpose of measuring the combined
change of a group of items.
Depending upon the method used for constructing an index, composite indices may be:
a. Simple Aggregative Price Index
b. Index of Average of Price Relatives
c. Weighted Aggregative Price Index
d. Index of Weighted Average of Price Relatives
Example 2
Given the following price-quantity data,
2016 2017
Item Price Production Price Production
Fish 15 500 20 600
Mutton 18 590 23 640
Chicken 22 450 24 500
Find the simple Aggregative price index with 2016 as base
Solution:
Item Prices
7
LECTURES 5, 6, 7_BUSINESS STATISTICS
Fish 15 20
Mutton 18 23
Chicken 22 24
Sum 55 67
a) Simple aggregative price index with 2016 as the base
∑ 𝑝𝑖
𝑃𝑜𝑖 = × 100
∑ 𝑝𝑜
67
𝑃𝑜𝑖 = × 100 = 𝟏𝟐𝟏. 𝟖𝟐
55
b) Index of Average of Price Relatives
Computed in the following two steps:
(i) After selecting the base year, find the price relative of each commodity for each year
with respect to the base year price. As defined earlier, the price relative of a commodity
for a given period is the ratio of the price of that commodity in the given period to its
price in the base period.
(ii) Multiply the result for each commodity by 100, to get simple price indices for each
commodity.
(iii)Take the average of the simple price/quantity indices by using arithmetic mean
Example 3
From the data in example 3, find: the index of average of price relatives (base year 2016)
Solution:
𝑝𝑖
Item Price relative: × 100
𝑝𝑜
Fish 133.33
Mutton 127.77
Chicken 109.09
Sum 370.19
𝑝𝑖
∑(
𝑝𝑜 × 100) 370.19
𝑃0𝑖 = = = 𝟏𝟐𝟑. 𝟑𝟗
𝑁 3
c) Weighted Aggregative Price Index
Assigns proper weights to individual items.
8
LECTURES 5, 6, 7_BUSINESS STATISTICS
Two methods of computing a weighted price index are the Laspeyres method and the Paasche
method.
They differ only in the period used for weighting i.e. Laspeyres method uses base-period
weights, whereas the Paasche method uses current-year weights.
i) Laspeyre’s Index
Laspeyre’s Price index, using base period quantities as weights is computed as:
∑ 𝑝𝑖 𝑞0
𝐿= × 100
∑ 𝑝𝑜 𝑞𝑜
9
LECTURES 5, 6, 7_BUSINESS STATISTICS
Item 𝑝0 𝑞0 𝑝1 𝑞0 𝑝0 𝑞1 𝑝1 𝑞1
Fish 7500 10000 9000 12000
Mutton 10620 13570 11520 14720
Chicken 9900 10800 11000 12000
Sum 28020 34370 31520 38720
∑ 𝑝𝑖 𝑞0
𝐿= × 100
∑ 𝑝𝑜 𝑞𝑜
34370
𝐿= × 100 = 122.66
28020
b. Paasche’s Price Index for 2017, using 2016 as the base
∑ 𝑝𝑖 𝑞𝑖
𝑃= × 100
∑ 𝑝𝑜 𝑞𝑖
38720
𝑃= × 100 = 122.84
31520
NB: An important point to note is that the Paasche's index has a downward bias and the
Laspeyre's index an upward bias. This directly follows from the fact that the Paasche's index,
relative to the Laspeyre's index, shows a smaller rise when the prices in general are rising,
and a greater fall when the prices in general are falling. In other words Laspeyres’ index
tends to overweight goods whose prices have increased. Paasche’s index, on the other hand,
tends to overweight goods whose prices have gone down.
10
LECTURES 5, 6, 7_BUSINESS STATISTICS
𝐹
𝑃0𝑖 = √122.6 × 122.84 = 122.72
1. Unit test
The unit test requires that the formula for constructing an index should be independent of
units in which prices or quantities of various are quoted.
For example in a group of commodities, while the price of wheat might be in kgs, that of
vegetable oil may be quoted in per litre and soap may be per unit
Except for the simple aggregative index, all other formulae discussed above satisfy this test.
11
LECTURES 5, 6, 7_BUSINESS STATISTICS
4. Circular test
This test is an extension of the time reversal test and considers more than two periods.
It is based on the shiftability of the bas period.
Among the discussed index numbers, it only satisfies the simple aggregative index.
Illustration 1:
Using the data below compute fishers ideal index and show that it satisfies time reversal test.
12
LECTURES 5, 6, 7_BUSINESS STATISTICS
Solution:
In this problem, the value of each item is given hence we have to find out quantity by dividing
the value by the price i.e. 𝑉𝑎𝑙𝑢𝑒 = 𝑃𝑟𝑖𝑐𝑒 × 𝑄𝑢𝑎𝑛𝑡𝑖𝑡𝑦
𝑉𝑎𝑙𝑢𝑒
∴ 𝑄𝑢𝑎𝑛𝑡𝑖𝑡𝑦 =
𝑃𝑟𝑖𝑐𝑒
Calculation of Fishers ideal price index:
Since , the Fisher’s ideal index satisfies the time reversal test.
13
LECTURES 5, 6, 7_BUSINESS STATISTICS
Illustration 2:
Using the data below calculate Fisher’s price index number and show that it satisfies both
Time Reversal Test and Factor Reversal Test for data given below
Solution:
14
LECTURES 5, 6, 7_BUSINESS STATISTICS
Assignment:
15
LECTURES 5, 6, 7_BUSINESS STATISTICS
Y 25 50 75 100 125
The ratio of change in the above example is the same. It is, thus, a case of linear
correlation.
If these variables are plotted on graph paper, all the points will fall on the same straight
line.
On the other hand, if the amount of change in one variable does not follow a constant
ratio with the change in another variable, it is a case of non-linear or curvilinear
correlation. If a couple of figures in either series X or series Yare changed, it would
give a non-linear correlation (e.g. quadratic relationships, cubic relationships,
exponential relationships, logarithmic relationships etc.)
Simple and Multiple Correlation: The distinction amongst these three types of correlation
depends upon the number of variables involved in a study.
If only two variables are involved in a study, then the correlation is said to be simple
correlation.
When three or more variables are involved in a study, then it is a problem of multiple
correlation.
Positive and Negative Correlation: refers to the nature of correlation i.e. the direction of
movement
If both the variables move in the same direction, then there is a positive correlation,
i.e., if one variable increases, the other variable also increases on an average or if one
variable decreases, the other variable also decreases on an average.
Alternatively, if the variables are move in opposite direction, then there is negative
correlation; e.g., movements of demand
16
LECTURES 5, 6, 7_BUSINESS STATISTICS
Correlation analysis
~ is a group of techniques to measure the relationship between variables.
It is also used along with regression analysis to measure how well the regression line explains
the variations of the dependent variable with the independent variable.
The commonly used methods for studying the relationship between two variables involve both
graphic and algebraic methods. Some of the widely used methods include:
(i) Pearson’s Coefficient of Correlation
(ii) Spearman’s Rank Correlation
(iii)Kendall's Tau /Concurrent Deviation Method - Calculating Kendall's Tau
manually can be very tedious without a computer and is rarely done without a
computer.
(iv) Scatter Diagram
(v) Correlation Graph
Only the first two are considered for this unit
17
LECTURES 5, 6, 7_BUSINESS STATISTICS
Company 1 2 3 4 5 6 7 8 9 10
Advertisement slots 35 40 38 44 67 64 59 69 25 50
Profits 112 128 130 138 158 162 140 175 125 142
̅̅̅
̅ )(𝒀 − 𝒀)
∑(𝑿 − 𝑿
𝒓𝒙𝒚 =
Method 1
̅ )𝟐 √∑(𝒀 − 𝒀
√∑(𝑿 − 𝑿 ̅ )𝟐
Pearson’ correlation
coefficient
𝑵 ∑ 𝑿𝒀 − ∑ 𝑿 ∑ 𝒀
Method 2 𝒓𝒙𝒚 =
√𝑵 ∑ 𝑿𝟐 − (∑ 𝑿)𝟐 𝑵 ∑ 𝒀𝟐 − (∑ 𝒀)𝟐
**Calculate the Pearson’s correlation coefficient for the data on advertisement slots and
profits
Example 1
Ten entries are submitted for a competition. Three judges study each entry and list the ten in
rank order. Their rankings are as follows:
Entry A B C D E F G H I J
Judge J1 9 3 7 5 1 6 2 4 10 8
Judge J2 9 1 10 4 3 8 5 2 7 6
Judge J3 6 3 8 7 2 4 1 5 9 10
Calculate the appropriate rank correlation and comment on which pair of judges agrees the
most and which pair of judges disagrees the most.
Solution:
Rank by Difference in Ranks
judges
Entry J1 J2 J3 𝑑(𝐽1 &𝐽2 ) 𝑑2 𝑑(𝐽1 &𝐽3 ) 𝑑2 𝑑(𝐽2 &𝐽3 ) 𝑑2
A 9 9 6 0 0 3 9 3 9
B 3 1 3 2 4 0 0 -2 4
C 7 10 8 -3 9 -1 1 2 4
D 5 4 7 1 1 -2 4 -3 9
E 1 3 2 -2 4 -1 1 1 1
F 6 8 4 -2 4 2 4 4 16
G 2 5 1 -3 9 1 1 4 16
H 4 2 5 2 4 -1 1 -3 9
I 10 7 9 3 9 1 1 -2 4
J 8 6 10 2 4 -2 4 -4 16
∑ 𝑑2 ∑ 𝑑2 ∑ 𝑑2
= 48 = 26 = 88
6 ∑ 𝑑2 156
𝜌(𝐽1 &𝐽2 ) = 1 − =1− = 1 − 0.1575 = +𝟎. 𝟖𝟒
𝑁(𝑁 2 − 1) 990
6 × 48
= 1− 6 ∑ 𝑑2
10(102 − 1) 𝜌(𝐽2 &𝐽3 ) = 1 −
𝑁(𝑁 2 − 1)
288 6 × 88
=1− = 1 − 0.29 = +𝟎. 𝟕𝟏 =1−
990 10(102 − 1)
528
= 1− = 1 − 0.53 = +𝟎. 𝟒𝟕
6 ∑ 𝑑2 990
𝜌(𝐽1 &𝐽3 ) = 1 −
𝑁(𝑁 2 − 1)
6 × 26
= 1−
10(102 − 1)
Conclusion: Judges J1 and J3 agree the most, while Judges J2 and J3 disagree the most
19
LECTURES 5, 6, 7_BUSINESS STATISTICS
- Spearman’s rank correlation formula can also be used when presented with quantitative variables,
, i.e. when the actual data but not the ranks relating to two variables are given.
- In such a case we shall have to convert the data into ranks. The highest (or the smallest) observation
is given the rank1. The next highest (or the next lowest) observation is given rank 2 and so on. It
is makes no difference in which way (descending or ascending) the ranks are assigned. However,
the same approach should be followed for all the variables under consideration.
**Example 2: Calculate the Spearman’s rank correlation coefficient for the data on
advertisement slots and profits
Example 3
Calculate the rank coefficient of correlation from the following data:
X 75 88 95 70 60 80 81 50
Y 120 134 150 115 110 140 142 100
Solution:
X Ranks Rx Y Ranks Ry d=Rx-Ry d2
75 5 120 5 0 0
88 2 134 4 -2 4
95 1 150 1 0 0
70 6 115 6 0 0
60 7 110 7 0 0
80 4 140 3 1 1
81 3 142 2 1 1
50 8 100 8 0 0
∑ 𝑑2 = 6
6 ∑ 𝑑2 6 ×6 36
𝜌 =1− =1− =1− = 1 − 0.07 = +𝟎. 𝟗𝟑
𝑁(𝑁 2 − 1) 8(82 − 1) 504
20
LECTURES 5, 6, 7_BUSINESS STATISTICS
The two variables may show similar movements but in reality there does not seem to be a common
link between them.
Assignment:
1. The data on price and quantity purchased relating to a commodity for 5 months is given below:
Month January February March April May
Prices 10 10 11 12 12
Quantity 5 6 4 3 3
Find the Pearson correlation coefficient between the process and comment on its sign and
magnitude.
2. Calculate Spearman’s rank correlation coefficient between advertisement cost (X) and sales
(Y) from the following data:
X 39 65 62 90 82 75 25 98 36 78
Y 47 53 58 86 62 68 60 91 51 84
REGRESSION
WHAT IS REGRESSION?
The previous lesson evaluated the direction and the significance of the linear relationship between
two variables by finding the correlation coefficient. If the correlation coefficient is significantly
different from zero, then the next step is to develop an equation to express the linear relationship
between the two variables.
Regression analysis is one of the most commonly used statistical techniques in social and
behavioral sciences as well as in physical sciences which involves identifying and evaluating the
relationship between a dependent (a.k.a response) variable and one or more independent variables
( also called predictor or explanatory variables).
It involves modeling the relationship between two or more variables.
Once the relation between the dependent variable and independent variable(s) has been modelled
using a regression equation, various tests are then employed to determine if the model is
satisfactory.
If the model is deemed satisfactory, the estimated regression equation can be used to predict the
value of the dependent variable given values for the independent variables
There are two types of linear regression:
o Simple linear regression – involves a single continuous dependent variable and a single
independent variable. For example, when trying to predict the demand for television sets
based on the prices of the televisions, the demand for television represents the dependent
variable while the prices of the television sets is the independent variable
o Multiple linear regression- regression involving more than two independent variables at a
time. For example, when trying to predict the demand for television sets based on the prices
of the televisions, the price of radios, number of advertisements etc. , the demand for
21
LECTURES 5, 6, 7_BUSINESS STATISTICS
television represents the dependent variable while the prices of the television sets , the price
of radios, number of advertisements etc. are the independent variables
Method 1
𝑵 ∑ 𝑿𝒀 − ∑ 𝑿 ∑ 𝒀 ̅ − 𝜷𝑿
𝒂=𝒀 ̅
𝜷=
𝑵 ∑ 𝑿𝟐 − (∑ 𝑿)𝟐
Method 2( Instead of directly
∑ 𝒀 ∑ 𝑿𝟐 − ∑ 𝑿 ∑ 𝑿𝒀 ̅−𝒂
𝒀
computing 𝛽, we may first compute 𝜶= 𝜷=
𝑵 ∑ 𝑿𝟐 − (∑ 𝑿)𝟐 ̅
𝑿
the value of a )
N is the number of cases/ individuals
22
LECTURES 5, 6, 7_BUSINESS STATISTICS
** Determine and interpret the regression equation for the data provided on advertisement slots and
profits
**Estimate the profits earned by a company that advertises 42 times in a month
Assignment: Determine the regression of X on Y for the same data (Hint: The dependent variable is now
advertisement, while the independent variable is profits)
23
LECTURES 5, 6, 7_BUSINESS STATISTICS
regression is a mathematical measure expressing the average relationship between the two
variables.
2. Correlation coefficient 𝑟 between two variables X and Y is symmetric, i.e., 𝒓𝒚𝒙 = 𝒓𝒙𝒚 and it is
immaterial which of X and Y is dependent variable and which is independent variable.
Regression coefficients are not symmetric in X and Y, i.e., 𝜷𝒚𝒙 ≠ 𝜷𝒙𝒚 and reflect upon the nature
of the variable, i.e., which is dependent variable and which is independent variable
3. Correlation need not imply cause and effect relationship between the variables under study.
However, regression analysis clearly indicates the cause and effect relationship between the
variables. The variable corresponding to cause is taken as independent variable and the variable
corresponding to effect is taken as dependent variable.
4. Correlation coefficient 𝑟𝑥𝑦 is a relative measure of the linear relationship between X and Y and
is independent of the units of measurement. It is a pure number lying between ±1.
On the other hand, the regression coefficients, 𝛽𝑦𝑥 𝑎𝑛𝑑 𝛽𝑥𝑦 are absolute measures representing
the change in the value of the variable Y (or X), for a unit change in the value of the variable X
(or Y). Once the functional form of regression curve is known; by substituting the value of the
independent variable we can obtain the value of the dependent variable and this value will be in
the units of measurement of the dependent variable.
24