0% found this document useful (0 votes)
7 views

Lecture 5, 6, 7 - Business Statisitcs

Uploaded by

lillymalahilu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Lecture 5, 6, 7 - Business Statisitcs

Uploaded by

lillymalahilu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

LECTURES 5, 6, 7_BUSINESS STATISTICS

TOPIC 5: CENSUS AND PROBABILITY SAMPLING


Basic probability concepts
 A probability is the numeric value, proportion or fraction whose value ranges from 0 to 1,
representing the chance, likelihood, or possibility that a particular event will occur e.g. such as
the price of a stock increasing, a rainy day, a defective product, or the outcome five dots in a
single toss of a die.
 Each possible outcome of a variable is referred to as an event.
 There are two main types of events:
i) A simple event is described by a single characteristic or instance e.g., in tossing a
coin, the two possible simple events are head and tail.
ii) A joint event is an event that has two or more characteristics e.g. getting two heads
when a coin is tossed is an example of a joint event because it consists of heads on
the first toss and heads on the second toss.
 The collection of all the possible events is called the sample space e.g. the sample space for
tossing a coin consists of heads and tails.
 The complement of event of an event e.g. A (represented by the symbol 𝐴’ 𝑜𝑟 𝐴𝑐 ) includes
all events that are not part of A i.e. it is the subset of outcomes in the sample space that are not
in the event. For example, complement of a head is a tail because that is the only event that is
not a head.
 An event that has no chance of occurring is referred to as an impossible event and has a
probability of 0 whereas a certain event is one that is sure to occur and has a probability of 1.
 There are three types of probability:
i) Apriori
ii) Empirical/ classical
iii) Subjective
 In a priori probability, the probability of an occurrence is based on prior knowledge of the
process involved. In the simplest case, where each outcome is equally likely, the chance of
occurrence of the event is defined in Equation
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑤𝑎𝑦𝑠 𝑖𝑛 𝑤ℎ𝑖𝑐ℎ 𝑒𝑣𝑒𝑛𝑡 𝑜𝑐𝑐𝑢𝑟𝑠
𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑜𝑐𝑐𝑢𝑟𝑒𝑛𝑐𝑒 =
𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠
E.g. consider a standard deck of cards that has 26 red cards and 26 black cards. The
probability of selecting a black card is 26/52 because there are 26 black cards and 52 total
cards
 In the empirical probability approach, the probabilities are based on observed data, not on
prior knowledge of a process. Surveys are often used to generate empirical probabilities.
Examples of this type of probability are the proportion of registered voters who prefer a certain
student leader, and the proportion of students who have part-time jobs. For example, if you
take a survey of students, and 60% state that they have part-time jobs, then there is a 0.60
probability that an individual student has a part-time job.
 Subjective probability is usually based on a combination of an individual’s past experience,
personal opinion, and analysis of a particular situation and differs from person to person. For

1
LECTURES 5, 6, 7_BUSINESS STATISTICS

example, the development team for a new product may assign a probability of 0.60 to the chance
of success for the product, while the president of the company may be less optimistic and assign
a probability of 0.30.
 Two events are mutually exclusive if both cannot occur simultaneously. If they cannot occur
simultaneously then the events are independent.
 A set of events is collectively exhaustive if one of the events must occur
Heads and tails in a coin toss are mutually exclusive events. The result of a coin toss cannot
simultaneously be a head and a tail. Heads and tails in a coin toss are also collectively
exhaustive events. One of them must occur. If heads does not occur, tails must occur.

RULES OF PROBABILITY

Associative For two events Independent Events Mutually Exclusive


Word A and B

Multiplication “and” = 𝐏(𝐀) × 𝐏(𝐁|𝐀)


𝑷(𝑨 𝒂𝒏𝒅 𝑩) = 𝑷(𝑨) × 𝑷(𝑩) Or
= 𝐏(𝐁) × 𝐏(𝐀|𝐁)

Additive “or” 𝑷(𝑨 𝒐𝒓 𝑩) = 𝑷(𝑨) + 𝑷(𝑩) − 𝑷(𝑨 𝒂𝒏𝒅 𝑩) = 𝑷(𝑨) + 𝑷(𝑩)

Conditional probability
 Refers to the probability of an event given information about the occurrence of another event
B i.e. it is used to quantify how probabilities change in light of new information.
 In some cases, there is additional knowledge that might affect the outcome of an experiment,
hence the need to alter the probability of an event of interest. A probability that reflects such
additional knowledge is called the conditional probability of the event.
 The probability of A given B is:
𝑷(𝑨 𝒂𝒏𝒅 𝑩)
𝑷(𝑨|𝑩) = 𝒆𝒒 𝟏
𝑷(𝑩)
 Similarly, The probability of B given A is:
𝑷(𝑨 𝒂𝒏𝒅 𝑩)
𝑷(𝑩|𝑨) = 𝒆𝒒 𝟐
𝑷(𝑨)
where P(A and B)=joint probability of A and B, P(A) marginal probability of B ,P(A)
marginal probability of B.
 The | symbol is read “given” and the event after the | symbol represents information that is
known.
 The above formula allows for derivation of the multiplication rule for mutually exclusive
events i.e.
𝑃(𝐴 𝑎𝑛𝑑 𝐵) = 𝑃(𝐴|𝐵) × 𝑃(𝐵) 𝑓𝑟𝑜𝑚 𝑒𝑞1
𝑃(𝐴 𝑎𝑛𝑑 𝐵) = 𝑃(𝐵|𝐴) × 𝑃(𝐴) 𝑓𝑟𝑜𝑚 𝑒𝑞 2

2
LECTURES 5, 6, 7_BUSINESS STATISTICS

BAYES’ THEOREM
 Bayes’ theorem is developed from the definition of conditional probability.
 Bayes’ theorem is used to revise previously calculated probabilities based on new
information i.e. it governs the likelihood that one event is based on the occurrence of some
other events.
 The conditional probabilities 𝑃(𝐴|𝐵) is not the same as 𝑃(𝐵|𝐴).
 Bayes’ Theorem is a way to convert probabilities of the form 𝑃(𝐵|𝐴) into probabilities of
the form 𝑃(𝐴|𝐵).
 To find the probability of A given B
𝑷(𝑩|𝑨) × 𝑷(𝑨)
𝑷(𝑨|𝑩) =
𝑷(𝑩|𝑨) × 𝑷(𝑨) + 𝑷(𝑩|𝑨′ ) × 𝑷(𝑨′ )

𝑷(𝑩|𝑨) × 𝑷(𝑨)
𝑷(𝑨|𝑩) =
𝑷(𝑩|𝑨) × 𝑷(𝑨) + 𝑷(𝑩|𝒏𝒐𝒕 𝑨) × 𝑷(𝒏𝒐𝒕 𝑨)
 The equation above is known as Bayes’ theorem and is a statement about the probability
of the event A, conditional upon B having occurred.
Example:
i) A company expects that there is a 5% probability that the economy will expand. Furthermore,
there is a 90% probability that the company’s revenue will rise if the economy expands. If the
economy does not expand, there is only 40% probability that company’s revenue will rise.
ii) Assignment: Kakuku Shoe Company use nails as one of their inputs. The company buys 60%
of the nails from company A, 15% from company B and the rest from company C. The nails
bought from Company A have a 12% breakage rate, Company B have a 8% breakage rate and
Company C have a 12% breakage rate.
a. If a nail is picked randomly, what is the probability that it was bought from
company A?
b. If a selected nail breaks, what is the probability it was bought from company A?

3
LECTURES 5, 6, 7_BUSINESS STATISTICS

TOPIC 6 : INDEX NUMBERS (Self study by students)


1. INTRODUCTION
 In business, managers and other decision makers may be concerned with the way in which the
values of variables change over time like prices paid for raw materials, numbers of employees
and customers, annual income and profits, and so on. Index numbers are one way of describing
such changes.
 If we turn to any journal devoted to economic and financial matters, we are very likely to come
across an index number of one or the other type e.g. it may be an index number of share prices
or a wholesale price index or a consumer price index or an index of industrial production.
 The objective of these index numbers is to measure the changes that have occurred in
prices, cost of living, production, and so forth. For example, if a wholesale price index
number for the year 2000 with base year1990 was 170; it shows that wholesale prices, in
general, increased by 70 percent in 2000 as compared to those in 1990.
 Index number enable economists and businessmen describe and appreciate business and
economic situations quantitatively.
- Definition: An index or index number measures the change in the level of a variable (typically
a product or service) between two time periods i.e. it is a number that expresses the relative
change in a certain phenomenon (e.g. price, quantity, or value) in a given period (current
period) compared to its value in a fixed period called the base period.

2. USES OF INDEX NUMBERS


The main use of an index number in business is to show the percent change in one or more
items from one time period to another.

Other uses include:

1. Index Numbers as Economic Barometers (indicators of changing business or economic


activity)
- Index numbers are “economic barometers” since they measure the pressure of economic
and business behaviour.
- The indices of prices (wholesale & retail), output (volume of trade, import and export,
industrial and agricultural production) and bank deposits, foreign exchange and reserves
etc., show the nature of, and variation in the general economic and business activity of the
country.

2. Index Numbers Help in Studying Trends and Tendencies


- Since the index numbers study the relative change in the level of a phenomenon at different
periods of time, they are especially useful for the study of the general trend for a group
phenomenon in time series data (data collected over a period of years)
- For example, if an individual is interested in establishing a new business, the study of the
trend of changes in the prices, wages and incomes in different industries is extremely
helpful to him to frame a general idea of the comparative courses, which the future holds
for different undertakings.

4
LECTURES 5, 6, 7_BUSINESS STATISTICS

3. Index Numbers Help in Formulating Decisions and Policies


- Index numbers of the data relating to various business and economic variables serve an
important guide to the formulation of appropriate policy.
- For example the excise duty on the production or sales of a commodity is regulated
according to the index numbers of the consumption of the commodity from time to time.
- Similarly, the indices of consumption of various commodities help in the planning of their
future production.

4. Price Indices Measure the Purchasing Power of Money


- A traditional use of index numbers is in measuring the purchasing power of money.
- Since the changes in prices and purchasing power of money are inversely related, an
increase in the general price index indicates that the purchasing power of money has gone
down.

3. CLASSIFICATION OF INDEX NUMBERS


Index numbers may be broadly classified into the following three categories:

a. Price Index Numbers


 The price index numbers measure the general changes in the prices. They are further sub-divided
into the following classes:
(i) Wholesale Price Index Numbers: These reflect the changes in the general price level of
a country.
(ii) Retail Price Index Numbers: These indices reflect the general changes in the retail prices
of various commodities such as consumption goods, stocks and shares, bank deposits,
government bonds, etc.
(iii) Consumer Price Index: Commonly known as the Cost of living Index, CPI is a
specialized kind of retail price index. It enables the study of the effect of changes in the
price of a basket of goods or commodities on the purchasing power or cost of living of
a particular class or section of the people like labour class, industrial or agricultural
worker, low income or middle income class etc.

b. Quantity Index Numbers


 Quantity index numbers study the changes in the volume of goods produced (manufactured),
consumed or distributed, like: the indices of agricultural production, industrial production,
imports and exports, etc.
 They are extremely helpful in studying the level of physical output in an economy.

c. Value Index Numbers


 These are intended to study the change in the total value (price multiplied by quantity) of output
such as indices of retail sales or profits or inventories. However, these indices are not as
common as price and quantity indices.
NB: THIS COURSE WILL STRICLY CONCENTRATE ON PRICE INDEX NUMBERS

 Various indices can also be distinguished on the basis of the number of commodities that go
into the construction of an index. Indices constructed for individual commodities or variable

5
LECTURES 5, 6, 7_BUSINESS STATISTICS

are termed as simple index numbers. Those constructed for a group of commodities or
variables are known as aggregative (or composite) index numbers.

4. TYPES OF INDEX NUMBERS


There are several types of index numbers. Some of them are discussed in the sections below.

4.1. Simple index numbers


 A simple price index number is based on the price or quantity of a single commodity.
 To construct a simple index, we first have to decide on the base period and then find ratio of
the value at any subsequent period to the value in that base period - the price/quantity relative.
 This ratio is then finally converted to a percentage
𝑃𝑟𝑖𝑐𝑒 𝑖𝑛 𝑃𝑒𝑟𝑖𝑜𝑑 𝑖
𝐼𝑛𝑑𝑒𝑥 𝑓𝑜𝑟 𝑎𝑛𝑦 𝑝𝑒𝑟𝑖𝑜𝑑 𝑖 = × 100
𝑃𝑟𝑖𝑐𝑒 𝑖𝑛 𝑏𝑎𝑠𝑒 𝑦𝑒𝑎𝑟

i.e. Simple price index for period i= 1, 2, 3… will be


𝑃𝑖
𝑃𝑜𝑖 = × 100
𝑃𝑜
 A simple price index shows how the current price per unit for a given item compares to base
period per unit for the same item.
 As seen above, the simple price index expresses the unit price in each period as a percentage
of the unit price in the base period. This is illustrated in the following example.

Example 1
Given the following price-quantity data of fish, with price quoted in Ksh per kg and production
in Kgs.

Year 1980 1981 1982 1983 1984 1985


Price 15 17 16 18 22 20
Production 500 550 480 610 650 600
Construct:

a) The price index for each year taking price of 1980 as base
Solution:

Year Price (pi) Price index 𝑃𝑜𝑖 = 𝑃𝑖 × 100


𝑃 𝑜
1980 15 100
17
1981 17 × 100 =113.33
15
1982 16 106.66
1983 18 120
1984 22 146.66
1985 20 133.33

 Given such an index, it is easy to find the percent by which the price/quantity may have changed
in a given period as compared to the base period.

6
LECTURES 5, 6, 7_BUSINESS STATISTICS

 For example, observing the index computed in the above example 1, one can firmly say that the
output of fish was 30 per cent more in 1984 as compared to 1980.
 It may also be noted that given the simple price/quantity for the base year and the index for the
period i = 1, 2, 3, …;the actual price for the period i = 1, 2, 3, …may easily be obtained as:
𝑝𝑜𝑖
𝑝𝑖 = 𝑝𝑜 ( )
100

For example, with i= 1983, Poi = 120, and p0 =15

120
𝑝𝑖 = 15 × ( ) = 18
100
4.2. Composite index numbers
 When considering price/quantity changes in several commodities a composite index number is
used.
 An aggregate price index is developed for the specific purpose of measuring the combined
change of a group of items.
 Depending upon the method used for constructing an index, composite indices may be:
a. Simple Aggregative Price Index
b. Index of Average of Price Relatives
c. Weighted Aggregative Price Index
d. Index of Weighted Average of Price Relatives

a) Simple aggregative price index


 The simple aggregate price index is computed as:
∑ 𝑝𝑖
𝑃𝑜𝑖 = × 100
∑ 𝑝𝑜

Where Pi=unit price for items in period i


P0 =unit price for items in period o (base period)

Example 2
Given the following price-quantity data,

2016 2017
Item Price Production Price Production
Fish 15 500 20 600
Mutton 18 590 23 640
Chicken 22 450 24 500
Find the simple Aggregative price index with 2016 as base

Solution:

Item Prices

2016(p0) 2017 (pi)

7
LECTURES 5, 6, 7_BUSINESS STATISTICS

Fish 15 20
Mutton 18 23
Chicken 22 24
Sum 55 67
a) Simple aggregative price index with 2016 as the base

∑ 𝑝𝑖
𝑃𝑜𝑖 = × 100
∑ 𝑝𝑜

67
𝑃𝑜𝑖 = × 100 = 𝟏𝟐𝟏. 𝟖𝟐
55
b) Index of Average of Price Relatives
 Computed in the following two steps:
(i) After selecting the base year, find the price relative of each commodity for each year
with respect to the base year price. As defined earlier, the price relative of a commodity
for a given period is the ratio of the price of that commodity in the given period to its
price in the base period.
(ii) Multiply the result for each commodity by 100, to get simple price indices for each
commodity.
(iii)Take the average of the simple price/quantity indices by using arithmetic mean

 Thus is it computed as:


𝑝𝑖
∑(
𝑝𝑜 × 100)
𝑃0𝑖 =
𝑁

Example 3
From the data in example 3, find: the index of average of price relatives (base year 2016)

Solution:
𝑝𝑖
Item Price relative: × 100
𝑝𝑜
Fish 133.33
Mutton 127.77
Chicken 109.09
Sum 370.19

𝑝𝑖
∑(
𝑝𝑜 × 100) 370.19
𝑃0𝑖 = = = 𝟏𝟐𝟑. 𝟑𝟗
𝑁 3
c) Weighted Aggregative Price Index
 Assigns proper weights to individual items.

8
LECTURES 5, 6, 7_BUSINESS STATISTICS

 Two methods of computing a weighted price index are the Laspeyres method and the Paasche
method.
 They differ only in the period used for weighting i.e. Laspeyres method uses base-period
weights, whereas the Paasche method uses current-year weights.

i) Laspeyre’s Index
Laspeyre’s Price index, using base period quantities as weights is computed as:

∑ 𝑝𝑖 𝑞0
𝐿= × 100
∑ 𝑝𝑜 𝑞𝑜

Where 𝐿 is the price index


𝑝𝑖 is the price of the current period
𝑝0 is the price in the base period
𝑞𝑜 is the quantity used in the base period
- In business/ economics, the Laspeyre’s index is used as a primary measure of cost of living,
i.e. Consumer Price index (CPI):
∑ 𝑝𝑖 𝑞0
𝐶𝑃𝐼 = 𝐿𝑎𝑠𝑝𝑒𝑦𝑟𝑒𝑠 𝐼𝑛𝑑𝑒𝑥 𝑁𝑢𝑚𝑏𝑒𝑟 = × 100
∑ 𝑝𝑜 𝑞𝑜

ii) Paasche’s Index


 The major disadvantage of the Laspeyres index is it assumes that the base-period quantities are
still realistic in the given period.
 The Paasche index is an alternative. The procedure is similar, but instead of using base-period
quantities as weights, we use current-period quantities as weights.
 Paasche’s Price index, using current period quantities as weights is computed as:
∑ 𝑝𝑖 𝑞𝑖
𝑃= × 100
∑ 𝑝𝑜 𝑞𝑖

Where 𝑃 is the price index


𝑝𝑖 is the price of current year
𝑝0 is the price in the base period
𝑞𝑖 is the quantity used in the current period
Example 4
Using the table from example 2 find

4.a.1. Laspeyre’s price index for 2017, using 2016 as base


4.a.2. Paasches’s price index for 2017 using 1980 as the base
Solution:
Calculations for

9
LECTURES 5, 6, 7_BUSINESS STATISTICS

Laspeyre’s and Paasche’s indices

(Base year = 2016)

Item 𝑝0 𝑞0 𝑝1 𝑞0 𝑝0 𝑞1 𝑝1 𝑞1
Fish 7500 10000 9000 12000
Mutton 10620 13570 11520 14720
Chicken 9900 10800 11000 12000
Sum 28020 34370 31520 38720

a. Laspeyre’s Price Index for 2017, using 2016 as the base

∑ 𝑝𝑖 𝑞0
𝐿= × 100
∑ 𝑝𝑜 𝑞𝑜

34370
𝐿= × 100 = 122.66
28020
b. Paasche’s Price Index for 2017, using 2016 as the base

∑ 𝑝𝑖 𝑞𝑖
𝑃= × 100
∑ 𝑝𝑜 𝑞𝑖

38720
𝑃= × 100 = 122.84
31520
NB: An important point to note is that the Paasche's index has a downward bias and the
Laspeyre's index an upward bias. This directly follows from the fact that the Paasche's index,
relative to the Laspeyre's index, shows a smaller rise when the prices in general are rising,
and a greater fall when the prices in general are falling. In other words Laspeyres’ index
tends to overweight goods whose prices have increased. Paasche’s index, on the other hand,
tends to overweight goods whose prices have gone down.

d) Index of Weighted Average of Price/Quantity Relatives


 To overcome the difficulty of overstatement of changes in prices by the Laspeyre's index and
understatement by the Paasche's index, different indices have been developed to compromise
and improve upon them.
 These include:
o Marshall-Edgeworth Index
o Dorbish and Bowley Index
o Fisher’s Ideal Index
- These are computed as follows:
Index Formula
Marshall- ∑(𝑞0 + 𝑞1 ) × 𝑝1
= × 100
Edgeworth Index ∑(𝑞0 + 𝑞1 ) × 𝑝0

10
LECTURES 5, 6, 7_BUSINESS STATISTICS

𝑝𝑖 is the price of the current period


𝑝𝑜 is the price in the base period
𝑞𝑜 is the quantity used in the base period
𝑞𝑖 is the quantity used in the current period

Dorbish and 𝐿+𝑃


=
Bowley Index 2
L Laspeyre’s index P Paasche’s index
Fisher’s Ideal - It is the geometric mean of the Laspeyre’s and Paasche’s indexes.
Index
= √𝐿 × 𝑃
L Laspeyre’s index
P Paasche’s index
 Fisher’s index seems to be theoretically ideal because it combines
the best features of both Laspeyre’s and Paasche’s. That is, it
balances the effects of the two indexes. However, it is rarely used
in practice because it has the same basic set of problems as the
Paasche’s index. It requires that a new set of quantities be
determined for each period.
 The Fisher’s Ideal Index of the data in example 4 is

𝐹
𝑃0𝑖 = √122.6 × 122.84 = 122.72

TESTS OF ADEQUACY OF INDEX NUMBERS


 So far we have discussed various formulae for construction of weighted and unweighted index
numbers.
 However the problem still remains of selecting an appropriate method for the construction of
an index number in a given situation.
 The following tests can be applied to find out the adequacy of an index number:
a) Unit test
b) Time reversal test
c) Factor reversal test
d) Circular test

1. Unit test
 The unit test requires that the formula for constructing an index should be independent of
units in which prices or quantities of various are quoted.
 For example in a group of commodities, while the price of wheat might be in kgs, that of
vegetable oil may be quoted in per litre and soap may be per unit
 Except for the simple aggregative index, all other formulae discussed above satisfy this test.

11
LECTURES 5, 6, 7_BUSINESS STATISTICS

2. Time reversal test


 This test is used to test whether a given method will work both backwards and forwards with
respect to time.
 The test is that the formula should give the same ratio between one point of comparison and
another no matter which of the two is taken as the base
 It is quoted more precisely as follows:
If the time subscripts of a price (or quantity) index number formula be interchanged, the
resulting price (or quantity) formula should be reciprocal of the original formula.
𝑝 1
 E.g. if p0 represents price of 2011 and p1 represents price at year 2012 then 𝑝1 = 𝑝0
0 ⁄𝑝1

 Symbolically the following relation should be satisfied:


𝑝01 × 𝑝10 = 1, omitting the factor 100 from both sides
𝑤ℎ𝑒𝑟𝑒:
𝑝01 𝑖𝑠 𝑡ℎ𝑒 𝒊𝒏𝒅𝒆𝒙 𝒇𝒐𝒓 𝒄𝒖𝒓𝒓𝒆𝒏𝒕 𝒚𝒆𝒂𝒓 ′𝟏′ 𝑏𝑎𝑠𝑒𝑑 𝑜𝑛 𝑏𝑎𝑠𝑒 𝑦𝑒𝑎𝑟 ′0′
𝑝10 𝑖𝑠 𝒊𝒏𝒅𝒆𝒙 𝒐𝒇 𝒃𝒂𝒔𝒆 𝒚𝒆𝒂𝒓 𝒃𝒂𝒔𝒆𝒅 𝒐𝒏 𝒚𝒆𝒂𝒓 ′𝟏′
 Among the methods discussed above, the following satisfy the time reversal test are :
a) Simple aggregate index
b) Fishers ideal formula
c) Marshall- Edgeworth formula

3. Factor reversal test


 An index number satisfies this test if the product of the Price index and Quantity Index gives
the True value ratio, omitting the factor 100 from each index
 It is satisfied if the change in the price multiplied by the change in quantity is equal to the
change in the value
 If p and q factors in a price (or quantity) index formula be interchanged, so that a quantity (or
price) index formula is obtained the product of the two indices should give the true value ratio.
 Symbolically:
∑ 𝑝1 𝑞1
𝑝𝑜1 × 𝑞01 = = 𝑇ℎ𝑒 𝑇𝑟𝑢𝑒 𝑉𝑎𝑙𝑢𝑒 𝑅𝑎𝑡𝑖𝑜 = 𝑇𝑉𝑅
∑ 𝑝0 𝑞0
 This test is only met by Fisher’s ideal index. No other index number satisfies it.

4. Circular test
 This test is an extension of the time reversal test and considers more than two periods.
 It is based on the shiftability of the bas period.
 Among the discussed index numbers, it only satisfies the simple aggregative index.

Illustration 1:
Using the data below compute fishers ideal index and show that it satisfies time reversal test.

12
LECTURES 5, 6, 7_BUSINESS STATISTICS

Solution:
In this problem, the value of each item is given hence we have to find out quantity by dividing
the value by the price i.e. 𝑉𝑎𝑙𝑢𝑒 = 𝑃𝑟𝑖𝑐𝑒 × 𝑄𝑢𝑎𝑛𝑡𝑖𝑡𝑦

𝑉𝑎𝑙𝑢𝑒
∴ 𝑄𝑢𝑎𝑛𝑡𝑖𝑡𝑦 =
𝑃𝑟𝑖𝑐𝑒
Calculation of Fishers ideal price index:

Time reversal test is satisfied when:

Since , the Fisher’s ideal index satisfies the time reversal test.
13
LECTURES 5, 6, 7_BUSINESS STATISTICS

Illustration 2:
Using the data below calculate Fisher’s price index number and show that it satisfies both
Time Reversal Test and Factor Reversal Test for data given below

Solution:

14
LECTURES 5, 6, 7_BUSINESS STATISTICS

Assignment:

1) Determine the simple price indexes.


2) Determine the simple aggregate price index for price index.
3) Determine Laspeyres’ price index.
4) Determine the Paasche price index
5) Determine Fisher’s ideal index.
6) Show that Fisher’s ideal index satisfies both Time Reversal Test and Factor Reversal
Test

LIMITATIONS OF INDEX NUMBERS


1. They are based on sample data. In case sample size is extremely limited and its selection is
faulty in that the sample units have not been selected randomly, then index numbers will
give wrong figures.
2. A number of formulae can be used in index number construction. These will give different
results. As a result the individual using the index should know a little more about different
formulae and their effect on the magnitude of the index.
3. Index number with the same base and items are useful for a short period. One has, therefore
to ensure that index does not use a very remote (old) year as the base.
4. An individual interpreting an index must be familiar with general aspects of the economy
and the factors relevant in this regard.
5. Indices are calculated for prices and quantities. The question is, does our index reflect a
change in the quality of a product or item? Also apart from quality changes, there are other
aspects which are pertinent (important) when interpreting the index numbers.

15
LECTURES 5, 6, 7_BUSINESS STATISTICS

TOPIC 7 : CORRELATION AND REGRESSION


CORRELATION
Definition of correlation
 Correlation is a measure of the nature and degree of association between two or more
variables.
 Correlation is a branch of statistics that deals with mutual dependence or inter-relationship of
two or more variables.
 If the value of two variables such that when one changes, the other changes too, then the
variable are said to be correlated.
 Correlation can be classified into three main categories :
(i) Linear and non-linear (curvilinear) and
(ii) Simple and multiple
(iii) Positive and negative,
 Linear and Non-linear (Curvilinear) Correlation: If the change in one variable is
accompanied by change in another variable in a constant ratio, then there is linear correlation.
Observe the following data:
X 10 20 30 40 50

Y 25 50 75 100 125

 The ratio of change in the above example is the same. It is, thus, a case of linear
correlation.
 If these variables are plotted on graph paper, all the points will fall on the same straight
line.
 On the other hand, if the amount of change in one variable does not follow a constant
ratio with the change in another variable, it is a case of non-linear or curvilinear
correlation. If a couple of figures in either series X or series Yare changed, it would
give a non-linear correlation (e.g. quadratic relationships, cubic relationships,
exponential relationships, logarithmic relationships etc.)
 Simple and Multiple Correlation: The distinction amongst these three types of correlation
depends upon the number of variables involved in a study.
 If only two variables are involved in a study, then the correlation is said to be simple
correlation.
 When three or more variables are involved in a study, then it is a problem of multiple
correlation.
 Positive and Negative Correlation: refers to the nature of correlation i.e. the direction of
movement
 If both the variables move in the same direction, then there is a positive correlation,
i.e., if one variable increases, the other variable also increases on an average or if one
variable decreases, the other variable also decreases on an average.
 Alternatively, if the variables are move in opposite direction, then there is negative
correlation; e.g., movements of demand

16
LECTURES 5, 6, 7_BUSINESS STATISTICS

 Correlation is measured using correlation coefficient. This coefficient numerically cannot


exceed 1 and is never less than -1 i.e. it lies between –1 and +1.
−1 ≤ 𝒓 ≤ 1
 The sign of the correlation coefficient indicates the nature of the correlation i.e. Positive value
indicates positive correlation, whereas negative value indicates negative correlation and a
correlation coefficient = 0 indicates the absence of correlation.
 The value of the correlation coefficient can be expressed in 3 ways of interpretation of
relationship between x and y. A value
 = +1 is a perfect positive correlation
 = −1 is a perfect negative correlation
 = 0 is no correlation ( the values don’t seem linked at all)
 The strength and direction of the correlation coefficient is summarized in the diagram below:

 Employing of correlation between two variables rely on some underlying assumptions.


o The variables are assumed to be independent
o The variables are assumed to have been randomly selected from the population
o The variables are normal distributed
o The association of data is homoscedastic (homogeneous) i.e. the data have the same
standard deviation in different groups
o The relationship between the two variables is linear.

Correlation analysis
 ~ is a group of techniques to measure the relationship between variables.
 It is also used along with regression analysis to measure how well the regression line explains
the variations of the dependent variable with the independent variable.
 The commonly used methods for studying the relationship between two variables involve both
graphic and algebraic methods. Some of the widely used methods include:
(i) Pearson’s Coefficient of Correlation
(ii) Spearman’s Rank Correlation
(iii)Kendall's Tau /Concurrent Deviation Method - Calculating Kendall's Tau
manually can be very tedious without a computer and is rarely done without a
computer.
(iv) Scatter Diagram
(v) Correlation Graph
 Only the first two are considered for this unit

17
LECTURES 5, 6, 7_BUSINESS STATISTICS

Question: A study is conducted involving 10 companies to investigate the association between


monthly advertisement slots and profits. Is there a relationship between the monthly
advertisement slots and profits (in millions) gained by the 10 companies?

Company 1 2 3 4 5 6 7 8 9 10
Advertisement slots 35 40 38 44 67 64 59 69 25 50
Profits 112 128 130 138 158 162 140 175 125 142

(i) Pearson’s Coefficient of Correlation


 ~ is the most widely used statistical method for measuring the intensity or the magnitude of the
linear relationship between two variables.
 For example, in the stock market, if we want to measure how two commodities are related to
each other, Pearson correlation is used to measure the degree of relationship between the two
commodities.
 Denoted as 𝒓(𝑿, 𝒀) or 𝒓𝒙𝒚 or simply r
 The following formulas are used to calculate the Pearson correlation coefficient

̅̅̅
̅ )(𝒀 − 𝒀)
∑(𝑿 − 𝑿
𝒓𝒙𝒚 =
Method 1
̅ )𝟐 √∑(𝒀 − 𝒀
√∑(𝑿 − 𝑿 ̅ )𝟐
Pearson’ correlation
coefficient
𝑵 ∑ 𝑿𝒀 − ∑ 𝑿 ∑ 𝒀
Method 2 𝒓𝒙𝒚 =
√𝑵 ∑ 𝑿𝟐 − (∑ 𝑿)𝟐 𝑵 ∑ 𝒀𝟐 − (∑ 𝒀)𝟐

**Calculate the Pearson’s correlation coefficient for the data on advertisement slots and
profits

(ii) Spearman’s Rank Correlation


- In some instances, the variables under consideration are not capable of quantitative
measurement but can be arranged in serial order.
- This happens when dealing with qualitative characteristics (attributes) such as honesty,
level of satisfaction etc. which cannot be measured quantitatively but can be arranged
serially.
- In such situations, the Spearman’s correlation computed by obtaining the correlation
coefficient between the ranks of N individuals in the two attributes under study.
- Spearman rank correlation test does not assume any assumptions about the distribution of
the data and is the appropriate correlation analysis when the variables are measured on a
scale that is at least ordinal.
- It is denoted by ρ(Rho) and is calculated by the formula :
𝟔 ∑ 𝒅𝟐
𝝆=𝟏−
𝑵(𝑵𝟐 − 𝟏)
Where d is the difference between the pair of ranks of the same individual in the two
characteristics and N is the number of pairs.
18
LECTURES 5, 6, 7_BUSINESS STATISTICS

Example 1
Ten entries are submitted for a competition. Three judges study each entry and list the ten in
rank order. Their rankings are as follows:
Entry A B C D E F G H I J
Judge J1 9 3 7 5 1 6 2 4 10 8
Judge J2 9 1 10 4 3 8 5 2 7 6
Judge J3 6 3 8 7 2 4 1 5 9 10
Calculate the appropriate rank correlation and comment on which pair of judges agrees the
most and which pair of judges disagrees the most.
Solution:
Rank by Difference in Ranks
judges
Entry J1 J2 J3 𝑑(𝐽1 &𝐽2 ) 𝑑2 𝑑(𝐽1 &𝐽3 ) 𝑑2 𝑑(𝐽2 &𝐽3 ) 𝑑2
A 9 9 6 0 0 3 9 3 9
B 3 1 3 2 4 0 0 -2 4
C 7 10 8 -3 9 -1 1 2 4
D 5 4 7 1 1 -2 4 -3 9
E 1 3 2 -2 4 -1 1 1 1
F 6 8 4 -2 4 2 4 4 16
G 2 5 1 -3 9 1 1 4 16
H 4 2 5 2 4 -1 1 -3 9
I 10 7 9 3 9 1 1 -2 4
J 8 6 10 2 4 -2 4 -4 16
∑ 𝑑2 ∑ 𝑑2 ∑ 𝑑2
= 48 = 26 = 88

6 ∑ 𝑑2 156
𝜌(𝐽1 &𝐽2 ) = 1 − =1− = 1 − 0.1575 = +𝟎. 𝟖𝟒
𝑁(𝑁 2 − 1) 990
6 × 48
= 1− 6 ∑ 𝑑2
10(102 − 1) 𝜌(𝐽2 &𝐽3 ) = 1 −
𝑁(𝑁 2 − 1)
288 6 × 88
=1− = 1 − 0.29 = +𝟎. 𝟕𝟏 =1−
990 10(102 − 1)
528
= 1− = 1 − 0.53 = +𝟎. 𝟒𝟕
6 ∑ 𝑑2 990
𝜌(𝐽1 &𝐽3 ) = 1 −
𝑁(𝑁 2 − 1)

6 × 26
= 1−
10(102 − 1)

Conclusion: Judges J1 and J3 agree the most, while Judges J2 and J3 disagree the most

19
LECTURES 5, 6, 7_BUSINESS STATISTICS

- Spearman’s rank correlation formula can also be used when presented with quantitative variables,
, i.e. when the actual data but not the ranks relating to two variables are given.
- In such a case we shall have to convert the data into ranks. The highest (or the smallest) observation
is given the rank1. The next highest (or the next lowest) observation is given rank 2 and so on. It
is makes no difference in which way (descending or ascending) the ranks are assigned. However,
the same approach should be followed for all the variables under consideration.

**Example 2: Calculate the Spearman’s rank correlation coefficient for the data on
advertisement slots and profits

Example 3
Calculate the rank coefficient of correlation from the following data:
X 75 88 95 70 60 80 81 50
Y 120 134 150 115 110 140 142 100
Solution:
X Ranks Rx Y Ranks Ry d=Rx-Ry d2
75 5 120 5 0 0
88 2 134 4 -2 4
95 1 150 1 0 0
70 6 115 6 0 0
60 7 110 7 0 0
80 4 140 3 1 1
81 3 142 2 1 1
50 8 100 8 0 0
∑ 𝑑2 = 6

6 ∑ 𝑑2 6 ×6 36
𝜌 =1− =1− =1− = 1 − 0.07 = +𝟎. 𝟗𝟑
𝑁(𝑁 2 − 1) 8(82 − 1) 504

Limitations of Correlation Analysis


1. Correlation analysis cannot determine cause-and-effect relationship. One should not assume that
a change in Y variable is caused by a change in X variable unless one is reasonably sure that one
variable is the cause while the other is the effect.
2. Another mistake that occurs frequently is the misinterpretation of the coefficient of correlation.
Suppose in one case r = 0.7, it will be wrong to interpret that correlation explains 70 percent of
the total variation in Y. The error can be seen easily when we calculate the coefficient of
determination. Here, the coefficient of determination r2 will be 0.49. This means that only 49
percent of the total variation in Y is explained.
3. Another mistake in the interpretation of the coefficient of correlation occurs when one concludes
a positive or negative relationship even though the two variables are actually unrelated. For
example, the age of students and their score in the examination have no relation with each other.

20
LECTURES 5, 6, 7_BUSINESS STATISTICS

The two variables may show similar movements but in reality there does not seem to be a common
link between them.

Assignment:
1. The data on price and quantity purchased relating to a commodity for 5 months is given below:
Month January February March April May
Prices 10 10 11 12 12
Quantity 5 6 4 3 3
Find the Pearson correlation coefficient between the process and comment on its sign and
magnitude.
2. Calculate Spearman’s rank correlation coefficient between advertisement cost (X) and sales
(Y) from the following data:
X 39 65 62 90 82 75 25 98 36 78
Y 47 53 58 86 62 68 60 91 51 84

REGRESSION
WHAT IS REGRESSION?
 The previous lesson evaluated the direction and the significance of the linear relationship between
two variables by finding the correlation coefficient. If the correlation coefficient is significantly
different from zero, then the next step is to develop an equation to express the linear relationship
between the two variables.
 Regression analysis is one of the most commonly used statistical techniques in social and
behavioral sciences as well as in physical sciences which involves identifying and evaluating the
relationship between a dependent (a.k.a response) variable and one or more independent variables
( also called predictor or explanatory variables).
 It involves modeling the relationship between two or more variables.
 Once the relation between the dependent variable and independent variable(s) has been modelled
using a regression equation, various tests are then employed to determine if the model is
satisfactory.
 If the model is deemed satisfactory, the estimated regression equation can be used to predict the
value of the dependent variable given values for the independent variables
 There are two types of linear regression:
o Simple linear regression – involves a single continuous dependent variable and a single
independent variable. For example, when trying to predict the demand for television sets
based on the prices of the televisions, the demand for television represents the dependent
variable while the prices of the television sets is the independent variable
o Multiple linear regression- regression involving more than two independent variables at a
time. For example, when trying to predict the demand for television sets based on the prices
of the televisions, the price of radios, number of advertisements etc. , the demand for

21
LECTURES 5, 6, 7_BUSINESS STATISTICS

television represents the dependent variable while the prices of the television sets , the price
of radios, number of advertisements etc. are the independent variables

SIMPLE LINEAR REGRESSION


 In regression analysis, the objective is to use data to position a line that best represents the
relationship between two variables.
 Simple linear regression allows us to summarize and study relationships between two continuous
(quantitative) variables.
 In a cause and effect relationship, the independent variable is the cause, and the dependent variable
is the effect.
 Least squares linear regression is a method for predicting the value of a dependent variable y,
based on the value of an independent variable x i.e. if X and Yare two variables of which
relationship is to be indicated, a line that gives best estimate of Y for any value of X, it is called
Regression line of Y on X.
 If the dependent variable changes to X, then best estimate of X by any value of Y is given by
Regression line of X on Y
 Mathematically, the regression model is represented by the following equation:
𝒚 = 𝜶 + 𝜷𝒙 ± 𝒆
Where
𝒙 −𝒊𝒏𝒅𝒆𝒑𝒆𝒏𝒅𝒆𝒏𝒕 𝒗𝒂𝒓𝒊𝒂𝒃𝒍𝒆
𝒚 −𝒅𝒆𝒑𝒆𝒏𝒅𝒆𝒏𝒕 𝒗𝒂𝒓𝒊𝒂𝒃𝒍𝒆
𝜶 −𝒊𝒏𝒕𝒆𝒓𝒄𝒆𝒑𝒕 𝒑𝒐𝒊𝒏𝒕 𝒐𝒇 𝒕𝒉𝒆 𝒓𝒆𝒈𝒓𝒆𝒔𝒔𝒊𝒐𝒏 𝒍𝒊𝒏𝒆 𝒂𝒏𝒅 𝒕𝒉𝒆 𝒚 𝒂𝒙𝒊𝒔
- it indicates the height on the vertical axis from where the straight line originates, representing the
value of Y when X is= 0
𝜷 −𝒕𝒉𝒆 𝒔𝒍𝒐𝒑𝒆 𝒐𝒇 𝒕𝒉𝒆 𝒓𝒆𝒈𝒓𝒆𝒔𝒔𝒊𝒐𝒏 𝒍𝒊𝒏𝒆
- is a measure of the slope of the straight line - it shows the absolute change in Y for a unit change in
X.
- As the slope may be positive or negative, it indicates the nature of relationship between Y and X.
- Accordingly, 𝛽 is also known as the regression coefficient of Y on X.
𝒆 −𝒆𝒓𝒓𝒐𝒓 𝒕𝒆𝒓𝒎

- The two values can be obtained directly as follows:

Method 1
𝑵 ∑ 𝑿𝒀 − ∑ 𝑿 ∑ 𝒀 ̅ − 𝜷𝑿
𝒂=𝒀 ̅
𝜷=
𝑵 ∑ 𝑿𝟐 − (∑ 𝑿)𝟐
Method 2( Instead of directly
∑ 𝒀 ∑ 𝑿𝟐 − ∑ 𝑿 ∑ 𝑿𝒀 ̅−𝒂
𝒀
computing 𝛽, we may first compute 𝜶= 𝜷=
𝑵 ∑ 𝑿𝟐 − (∑ 𝑿)𝟐 ̅
𝑿
the value of a )
N is the number of cases/ individuals
22
LECTURES 5, 6, 7_BUSINESS STATISTICS

** Determine and interpret the regression equation for the data provided on advertisement slots and
profits
**Estimate the profits earned by a company that advertises 42 times in a month
Assignment: Determine the regression of X on Y for the same data (Hint: The dependent variable is now
advertisement, while the independent variable is profits)

PROPERTIES OF REGRESSION COEFFICIENTS (𝜷)


 As explained earlier, the slope of regression line is called the regression coefficient. It tells the
effect on dependent variable if there is a unit change in the independent variable.
 Since for a paired data on X and Y variables, there are two regression lines: regression line of Yon
X and regression line of X on Y, hence there are two regression coefficients:
o Regression coefficient of Yon X, denoted by 𝛽𝑦𝑥
o Regression coefficient of X on Y, denoted by 𝛽𝑥𝑦
 The following are the important properties of regression coefficients:
(i) The value of both the regression coefficients cannot be greater than 1. However, value of both
the coefficients can be below 1 or at least one of them must be below 1, so that the square root
of the product of two regression coefficients must lie in the limit ±1.
(ii) Coefficient of determination is the product of both the regression coefficients i.e.
𝒓𝟐 = 𝛃𝐲𝐱 × 𝜷𝒙𝒚
It is the proportion of the variation in the dependent variable that is predictable from the
independent variable. It measures how well a statistical model predicts an outcome.
The lowest possible value is 0 while the highest is 1
(iii)Coefficient of correlation is the geometric mean of the regression coefficients, i.e.
𝒓 = ±√𝛃𝐲𝐱 × 𝜷𝒙𝒚
If the signs of both the regression coefficients are the same, the value of r will also have the
same sign. If not then r will be negative
iii) The mean of both the regression coefficients is either equal to or greater than the coefficient of
correlation, i.e.
𝒓 = 𝛃𝐲𝐱 + 𝜷𝒙𝒚
≥𝒓
𝟐
CORRELATION ANALYSIS VERSUS REGRESSION ANALYSIS
 Correlation and Regression are the two related aspects of a single problem of the analysis of the
relationship between the variables.
 If we have information on more than one variable, we might be interested in seeing if there is any
connection or any association - between them. (Correlation)
 If such an association is found, we might again be interested in predicting the value of one variable
for the given and known values of other variable(s). (Regression)
1. Correlation indicates the relationship between two or more variables such that the movements in
one tend to be accompanied by the corresponding movements in the other(s). On the other hand,

23
LECTURES 5, 6, 7_BUSINESS STATISTICS

regression is a mathematical measure expressing the average relationship between the two
variables.
2. Correlation coefficient 𝑟 between two variables X and Y is symmetric, i.e., 𝒓𝒚𝒙 = 𝒓𝒙𝒚 and it is
immaterial which of X and Y is dependent variable and which is independent variable.
Regression coefficients are not symmetric in X and Y, i.e., 𝜷𝒚𝒙 ≠ 𝜷𝒙𝒚 and reflect upon the nature
of the variable, i.e., which is dependent variable and which is independent variable
3. Correlation need not imply cause and effect relationship between the variables under study.
However, regression analysis clearly indicates the cause and effect relationship between the
variables. The variable corresponding to cause is taken as independent variable and the variable
corresponding to effect is taken as dependent variable.
4. Correlation coefficient 𝑟𝑥𝑦 is a relative measure of the linear relationship between X and Y and
is independent of the units of measurement. It is a pure number lying between ±1.
On the other hand, the regression coefficients, 𝛽𝑦𝑥 𝑎𝑛𝑑 𝛽𝑥𝑦 are absolute measures representing
the change in the value of the variable Y (or X), for a unit change in the value of the variable X
(or Y). Once the functional form of regression curve is known; by substituting the value of the
independent variable we can obtain the value of the dependent variable and this value will be in
the units of measurement of the dependent variable.

24

You might also like