0% found this document useful (0 votes)
71 views88 pages

SM For Statistics

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views88 pages

SM For Statistics

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 88

STUDY MATERIAL -

STATISTICS FOR MANAGEMENT


TERMINAL QUESTIONS
UNIT 1
Q1. Mention the characteristics of Statistics.
Ans.1 There are several characteristics of Statistics. Not only does it deal with an aggregate of
facts, but it also gets affected by multiple causes. Statistics numerically expressed, is estimated
with varying degrees of accuracy and is collected in a systematic manner for pre-determined
purposes. To ensure comparative and analytical studies, statistical facts need to be arranged in
a systematic, logical order. Let us look at each characteristic in detail.
• Statistics deals with an aggregate of facts.
A single figure cannot be analysed. For example, the fact ‘Mr Kiran is 170 cms tall’ cannot be
statistically analysed. On the other hand, if we know the heights of 60 students in a class, we
can comment upon the average height and variation.
• Statistics gets affected primarily by multiplicity of causes.
The Statistics of the yield of a crop is the result of several factors, such as the fertility of soil,
amount of rainfall, the quality of seed used, the quality and quantity of fertilizer used.
• Statistics are numerically expressed.
Only numerical facts can be statistically analysed. Therefore, facts such as ‘price decreases
with increasing production’ cannot be called statistics. The qualitative data such as, the
categorical data cannot be called as statistics, for example, the eye colour of a person or the
brand name of an automobile.
• Statistics are enumerated or estimated with required degree of accuracy.
The facts must be collected from the field or estimated (computed) with the required degree
of accuracy. The degree of accuracy differs depending upon the purpose. For example, in
measuring the length of screws, an accuracy of up to a millimetre may be required, whereas
while measuring the heights of students in a class, an accuracy of up to a centimetre is enough.
• Statistics are collected in a systematic manner.
The facts should be collected according to planned and scientific methods otherwise, they are
likely to be wrong and misleading.
• Statistics are collected for a pre-determined purpose.
There must be a definite purpose for collecting facts. Otherwise, indiscriminate data collection
might take place which would lead to wrong diagnosis.

• Statistics are placed in relation to each other.


The facts must be placed in such a way that a comparative and analytical study becomes
possible. Thus, only related facts which are arranged in a logical order can be called Statistics.
Statistical analysis cannot be used to compare heterogeneous data.

Q2. Give the meaning of the word Statistics.


Ans.2 According to Seligman, “Statistics is a science which deals with the method of
collecting, classifying, presenting, comparing and interpreting the numerical data to throw
light on enquiry”.
According to Horace Secrist, Statistics may be defined as “an aggregate of facts affected to a
marked extent by multiplicity of causes, numerically expressed, enumerated or estimated
according to a reasonable standard of accuracy, collected in a systematic manner for a
predetermined purpose and placed in relation to each other”. This definition is both
comprehensive and exhaustive.
Prof. Boddington, on the other hand, defined Statistics as “The science of estimates and
probabilities”.
According to Croxton and Cowden, “Statistics is the science of collection, presentation,
analysis and interpretation of numerical data from logical analysis”.

Q3. What are the limitations of Statistics?


Ans.3 Statistics has certain limitations:
1. Statistics does not deal with qualitative data
Qualitative data deals with meanings while quantitative data deals with numbers. Qualitative
data describes properties or characteristics that are used to identify things. Quantitative data
describes data in terms of quantity using the numerical figure accompanied by a measurement
unit. Statistics deals only with quantitative data. Statistics deals with numerical data, which
can be expressed in terms of quantitative measurements. So, the qualitative phenomenon like
beauty and intelligence cannot be expressed numerically and any statistical analysis cannot be
directly applied on these qualitative phenomena. However, Statistical techniques may be
applied indirectly by first reducing the qualitative data to accurate quantitative terms. For
example, the intelligence of a group of students can be studied based on their marks in a
particular examination.
2. Statistics does not deal with individual facts
Statistical methods can be applied only to aggregates of facts, because analysis and
interpretation of data is highly difficult in the case of individual facts.
3. Statistical inferences (conclusions) are not exact
Statistical inferences are true only on an average. They are probabilistic statements. For
example, in case of a data, which consists of the height of 200 male persons taken from a
graduate school, the inferences so obtained may not hold true for an individual male person.
4. Statistics can be misused and misinterpreted
Lack of sufficient knowledge of statistical science often leads to incorrect conclusions.
Therefore, proper care must be taken while selecting the collection method and in choosing
appropriate statistical models. Increasing misuse of Statistics has led to increasing distrust in
Statistics.
5. Common men cannot handle Statistics properly
The field of Statistics is so vast that it needs experience as well as skill to understand it
effectively and apply the statistical concepts and models. Hence, only statisticians can handle
statistics properly.

Q4. What are the functions of Statistics?


Ans.4 Following are the functions of statistics:
1. Statistics simplifies mass data
The use of statistical concepts helps in simplification of complex data. Using statistical
concepts, the managers can make decisions more easily. The statistical methods help in
reducing the complexity of the data and in the understanding of any huge mass of data.
2. Statistics brings out trends and tendencies in the data
After data is collected, it is easy to analyse the trend and tendencies in the data by using the
various concepts of Statistics.
3. Statistics brings out the hidden relations between variables
Statistical analysis helps in drawing inferences on the data. Statistical analysis brings out the
hidden relations between variables.
4. Decision making power becomes easier
With the proper application of Statistics and statistical software packages on the collected data,
managers can take effective decisions, which can increase the profits in a business.
5. Statistics makes comparison easier
Without using statistical methods and concepts, collection of data and comparison would be
difficult. Statistics helps us to compare data collected from various sources. Grand totals,
measures of central tendency and measures of dispersion, graphs and diagrams and coefficient
of correlation all provide ample scope for comparison.

5. What is the importance of Statistics in modern business environment?


Ans.5 Due to advanced communication networks, rapid changes in consumer behaviour,
varied expectations of a variety of consumers and new market openings, modern managers
have a difficult time in making quick and appropriate decisions. Therefore, there is a need for
them to depend more upon quantitative techniques like mathematical models, statistics,
operations research and econometrics. In this section, there are examples that illustrate some
of the uses of statistics in business and economics.

Accounting
Public accounting firms use statistical sampling procedures when conducting audits for their
clients.
Finance
Financial advisors use a variety of statistical information to guide their investment
recommendations.
Marketing
Electronic scanners at retail checkout counters are being used to collect data for a variety of
marketing research applications.
Production
Today’s emphasis is on quality. Quality is of utmost importance in production. A variety of
statistical quality control charts are used, to monitor the average output of a production
process.
Economics
Economists are frequently asked to provide forecasts about the future of the economy. They
use a variety of statistical information in making such forecasts. For example, in forecasting
inflation index, economists use statistical information on indicators such as the producer index,
the unemployment rate and manufacturing capacity utilisation.

6. Explain any two applications of Statistics.


Ans.6 Statistical methods are applied to specific problems in various fields such as Biology,
Medicine, Agriculture, Commerce, Business, Economics, Industry, Insurance, Sociology and
Psychology.

In the field of medicine, statistical tools like t-tests are used to test the efficiency of the new
drug or medicine. In the field of economics, statistical tools such as index numbers, estimation
theory and time series analysis are used in solving economic problems related to wages, price,
production and distribution of income. In the field of agriculture, an important concept of
statistics such as analysis of variance (ANOVA) is used in experiments related to agriculture,
to test the significance between two sample means.

Statistics is a part of Economics, Commerce and Business. Statistical analysis of the variations
in price, demand and production are helpful to both businessmen and economists. Cost of
living index numbers help governments in economic planning and fixation of wages. A
government’s administrative system is fully dependent on production statistics, income
statistics, labour statistics, economic indices of cost, and price. Economic planning of any
nation is entirely based on the statistical facts. Cost of living index numbers are also used to
estimate the value of money. In business activities, analysis of demand, price, production cost,
and inventory costs help in decision making.

UNIT 2

1. What is statistical survey?


Ans.1 A Statistical Survey is a scientific process of collection and analysis of numerical data.
Statistical surveys are used to collect information about units in a population and it involves
asking questions to individuals. Surveys of human populations are common in government,
health, social science and marketing sectors.

2. Enumerate the factors which should be kept in mind for proper planning.
Ans.2
1. Objective and Scope:
- Clearly define survey purpose and objectives.
- Determine the scope of the study.
2. Population and Sampling:
- Identify target population.
- Choose a representative sample.

3. Sampling Size:
- Determine an appropriate sample size for significance.
4. Sampling Frame:
- Develop a comprehensive list of the population.
5. Survey Design:
- Choose appropriate survey methods.
- Design clear and unbiased questions.
6. Data Collection Tools:
- Select suitable data collection tools.
- Train surveyors for consistency.
7. Pilot Testing:
- Test survey instruments for issues.
- Evaluate clarity and effectiveness.
8. Data Analysis Plan:
- Determine analysis methods.
- Plan for handling incomplete data.
9. Timeline:
- Develop a realistic survey timeline.
10. Budget:
- Estimate financial resources required.
11. Ethical Considerations:
- Ensure compliance with ethical standards.
- Address potential biases.
12. Quality Control:
- Implement measures for data quality.
- Include validation checks.
13. Data Security:
- Protect confidentiality and security of data.
14. Reporting and Dissemination:
- Plan for presenting and disseminating results.
- Tailor reporting methods for the audience.

3. What do you understand by the unit of measurement? Explain with


examples.
Ans.3 It refers to the unit of the population on which measurements are made, for example,
the height of employees in an office. Employees are individuals or units. Height is the
measurement made on them.

4. Distinguish between:
a) Primary and secondary data
b) Direct and indirect investigation
c) Questionnaire and schedule

Ans.4 a) Data collected for the first time by the investigator is primary data. Data collected by
some other persons but used by the investigator for his/her study is known as secondary data.

b) Direct investigations are carried out directly by the investigator. Investigation conducted
through mail questionnaire is called indirect investigation.

c) Questionnaires contain simple questions and are filled by respondents. Schedules also
contain questions, but responses are recorded directly by the investigator.

UNIT 3
1. Form frequency distribution for the following data regarding weight of 50 people.
Table 3.48: Data regarding weight of 50 people

50 72 61 64 72 62 61 56 75 55
52 71 54 64 71 64 59 59 70 54
60 60 57 57 66 68 60 62 68 54
62 65 58 64 65 60 60 67 58 56
70 62 60 68 64 62 59 69 52 58

Ans.1 Frequency Distribution Table

Class Interval Frequency


50-55 7
55-60 10
60-65 18
65-70 8
70-75 6
75-80 1
Total 50

2. Junior executive of XYZ Company has prepared budget for a new division of the
company. Table 3.49 depicts the budget data. Vice president of the company wanted to
see the summary of the budget in a diagrammatic form. Prepare a pie diagram.
Table 3.49: Budget of XYZ Company

CATEGORY Rs. In Lakhs


Capital investment 140
Salary and wages 65
Raw material 100
Research and development expenses 15
Miscellaneous 40

Ans.2

CATEGORY Angle Subtended at the centre of circle


Capital investment 140o
Salary and wages 65o
Raw material 100o
Research and development expenses 15o
Miscellaneous 40o

Capital Investment

Salary and Wages

Raw material

R&D

Misc.

3. ABC Ice Cream Company attempts to keep all its ten flavours of ice cream in stock at
each of its stores. In-charge of stores operation collects data on the daily amount of each
flavour to the nearest half gallon.

i. Is the flavour classification discrete or continuous? Open or closed?


ii. Data collected, is it qualitative or quantitative?
iii. Is the amount collected on each flavour discrete or continuous?
Ans.3 i. Discrete and closed
ii. Flavour is qualitative. Volume is quantitative
iii. Continuous

4. Table 3.50 depicts certain data. Construct histogram for this data.

Table 3.50: Frequency Table

Class 0-5 5-10 10-15 15-20 20-25 25-30


Frequency 4 6 10 5 3 1

Ans.4 Below figure depicts a histogram diagram for the given data:

5. Association of real estate sellers has collected data on a sample of 100 people with
respect to the monthly commission earned by them. Table 3.51 depicts certain data.
Construct an ogive. Find:
i. What proportion of salespeople earn more than 25,000
ii. What proportion earn between 15,000 and 25,000.

Table 3.51: Collected Data of 100 people with Respect to Commissions


Earned

Earnings 5000 5000- 10000- 15000- 20000- 25000-


or less 10000 15000 20000 25000 30000
No. of People 5 9 13 30 27 16

Ans.5
Earnings Frequency Cumulative
Frequency
>5000 5 5
>10000 9 14
>15000 13 27
>20000 30 57
>25000 27 84
>30000 16 100
TOTAL 100

Chart Title
120

100

80

60

40

20

0
1 2 3 4 5 6 7

i. 16%
ii. 57%

UNIT 4
1. In an office there are 84 employees. Their salaries in Indian rupees are as given in
table. Find the mean salary per day.
Table 4.60: Salaries of 84 Employees
Salary/Day 60 70 80 90 100 120
Employees 3 5 8 10 4 2

Ans.1 Mean = ∑𝑓𝑖𝑥𝑖


𝛴𝑓𝑖

Salary / Day (𝒙𝒊) Employees (𝒇𝒊) 𝒇𝒊𝒙𝒊


60 3 180
70 5 350
80 8 640
90 10 900
100 4 400
120 2 240
TOTAL 𝜮𝒇𝒊= 32 ∑𝒇𝒊𝒙𝒊 = 𝟐𝟕𝟏𝟎

Mean= 2710= 84.68


32

2.A survey of 128 smokers gave the results represented in table 4.61, which are frequency
distributions of smokers’ daily expenses on smoking. Find the mean expenses and
standard deviation. Determine coefficient of variation.
Table 4.61: Survey Results of 128 Smokers

Expenditure 10-20 20-30 30-40 40-50 50-60 60-70 70-80


(Rs.)
No. of 23 44 35 12 9 3 2
Smokers

Ans.2
Class Frequency Mid- 𝒇𝑿 Deviation D2 𝒇. 𝑫2
Interval (𝒇) point (X) (D=X-𝑿 ̅)
10-20 23 15 345 -16.64 276.88 6368.46
20-30 44 25 1100 -6.64 44.08 1939.52
30-40 35 35 1225 3.36 11.28 394.8
40-50 12 45 540 13.36 178.48 2141.76
50-60 9 55 495 23.36 545.68 4911.12
60-70 3 65 195 33.36 1112.88 3338.64
70-80 2 75 150 43.36 1880 3760
TOTAL 𝜮𝒇= 128 ∑ 𝒇𝒙 =4050 ∑ 𝒇𝑫2 = 22854.3

∑ 𝑓𝑥 4050
Mean = = = 31.64
𝛴𝑓 128

𝛴𝑓⋅𝐷2 22854.3
Standard Deviation =√ =√ = √178.54 = 13.36
𝛴𝑓 128

Coefficient of Variation = 𝑆⋅𝐷⋅ = 13.36 𝑥100 = 42.225


𝑥 31.64

3. For the distribution shown in table 4.62, find the median and mode.
% Marks 0-10 10-20 20-30 30-40 40-50 50-60 60-70
No. of Smokers 4 9 19 20 18 7 8

No. of Cumulative
% Marks Mid-point(x) f.x
Smokers(𝒇) Frequency (cf)
0-10 4 4 5 20
10-20 9 13 15 135
20-30 19 32 25 475
30-40 20 52 35 700
40-50 18 70 45 810
50-60 7 77 55 385
60-70 8 85 65 520
TOTAL N= 𝜮𝒇= 85 ∑ 𝒇𝒙 =3045
Ans.3
Now, N=85, 𝑁 = 42.5
2
The cumulative frequency just greater than 42.5 is 52, and the corresponding class is 30-40.
Therefore, l=30, h=10, N=85, f=20, and cf=32.
(𝑁−𝐶𝑓) 42⋅5−32
2
Median= 𝑙 + × ℎ = 30 + ( ) × 10 = 30 + 5.25 = 35.25
𝑓 20
Mean = 35.82
Mode = 3(median) – 2(mean)
= 3(35.25) – 2(35.82)
= 34.11
4. Find the geometric mean of the following distribution given in table.
X 110 115 118 119 120
f 4 11 21 6 2
Ans.4
x 𝒇 log(x) 𝒇log(x)
110 4 2.0413 8.1652
115 11 2.0606 22.6666
118 21 2.0718 43.5078
119 6 2.0755 12.453
120 2 2.0791 4.1582
N= 44 ∑ 𝒇𝒍𝒐𝒈(𝒙) =90.9508
𝛴𝑓 log 𝑥
G.M = Antilog { }
𝑁
90.9508
= Antilog { }
44
= Antilog {2.067}
= 116.7
5. Find the harmonic mean of the following distribution given in table.
𝒙 121 122 123 124 125
f 5 25 36 37 20
Ans.5
𝒇
𝒙 𝒇
𝒙
121 5 0.0413
122 25 0.2049
123 36 0.2926
124 37 0.2938
125 20 0.16
∑ 𝒇 = 0.9926
TOTAL 𝜮𝒇= 123 𝒙

Harmonic Mean = 𝑁
𝑓
∑(𝑥)
123
= 0.9926
= 123.91

6.Given that, sum of upper and lower quartiles is 122 and their difference
is 23; find the quartile deviation of the series.
Ans.6 Quartile Deviation (Q.D) = (Q3 – Q1) / 2
= 23 / 2
= 11.5
7. If Coefficient of Variation = 22 and S.D. = 4, find the mean.
Ans.7 Given, Coefficient of variation = 22
Standard Deviation = 4
We know, coefficient of Variation = 𝑆.𝐷 𝑥 100
𝑀𝑒𝑎𝑛
22 = 4
𝑥 100
𝑀𝑒𝑎𝑛
Mean = 400 / 22
Mean = 18.18
8. The table shows the distribution of age at the time of first delivery of 65 women. Find
mean deviation from mean and median.

Age 18-22 22-26 26-30 30-34 34-38


Frequency 20 30 11 3 1
Ans.8
Age Frequency (𝒇) 𝑪. 𝑭. Average (Xi) |Xi – M| 𝒇. (|𝐗𝐢 – 𝐌|)
18-22 20 20 20 3.66 73.20
22-26 30 50 24 0.34 10.20
26-30 11 61 28 4.34 47.74
30-34 3 64 32 8.34 25.02
34-38 1 65 36 12.34 12.34
∑ 𝒇 . (|𝐗𝐢 – 𝐌|) =
TOTAL 𝑵= 65
168.50

Now, N=65, 𝑁 = 32.5


2
The cumulative frequency just greater than 32.5 is 50, and the corresponding class is 22-26.
Therefore, l=22, h=4, N=65, f=30, and cf=20.
ℎ 𝑁 4
Median(M) = 𝑙 + ( − 𝐶 ⋅ 𝐹) = 22 + (32 ⋅ 5 − 20) = 23.66
𝑓 2 30
( | ∑ 𝒇(|𝐗𝐢 – 𝐌|))
Mean Deviation (M.D) = ∑𝒇
M.D. = 168.50 / 65 = 2.592
9. Read the data given below and find the combined mean, S.D. and coefficient of
variation.
n1 = 15 n2 = 20
̅ 𝟏 = 40
𝑿 ̅ 𝟐 = 50
𝑿
1 = 3 2 =5 𝑁 ̅𝑋̅+𝑁 𝑋̅ 600+1000
Ans.9 Combined Mean (𝑿 ̅ ) = 11 22 = = 1600 / 35 = 45.7
𝟏𝟐 𝑁1+𝑁2 15+20
̅𝟏 − 𝑿
d1 = 𝑿 ̅ 𝟏𝟐 = (-5.7)
2
d1 = 32.49
̅ 𝟐 −𝑿
d2 = 𝑿 ̅ 𝟏𝟐 = (4.3)
d22 = 18.49
𝑁1𝜎2+𝑁2𝜎2+𝑁1𝑑2+𝑁2𝑑2 15(9 )+20(25)+15(32.49)+20(18.49)
Combined S.D. = √ 1 2 1 2 =√ 15+20
𝑁1+𝑁2
1492.05
=√ = √42.63 = 6.53
35
10. Mean and Standard deviation of lengths of tails of 8 rats were found to be 4.7 cm and
0.8 cm respectively. However, one reading was taken as 3.6 cm instead of 6.3 cm; find the
corrected mean and standard deviation.
Ans.10
Mean = 4.7 cm
∑𝑥
4.7 =
𝑁
∑ 𝑥 = 4.7 x 8 = 37.6
If we replace the incorrect reading with 6.3c.m,
∑ 𝑥 = 37.6 – 3.6 + 6.3 = 40.3
Now, correct mean = 𝟒𝟎.𝟑 = 5.0375 cm
𝟖
Standard
2
deviation
2
= 0.8 cm, (S.D.)2 = 0.64
𝑥∑ 𝛴𝑥1
𝑖 −( ) = 0.64
𝑁 𝑁
𝑥 ∑2
𝑖 − (4.7)2 = 0.64
8
𝑥 ∑2
𝑖 − 22.09 = 0.64
8
2
𝑥 𝑖 =
∑ 22.73
8
𝑥 ∑𝑖2 = 181.84
Now, replacing2 the incorrect
2
value, 𝑥 ∑𝑖2 = 181.84 – (3.6)2 + (6.3)2 = 208.57
𝑥∑ 𝛴𝑥1 208.57
Variance = 𝑖 −( ) =
− (5.0375)2 = 26.0712 – 25.3764 = 0.6947
𝑁 𝑁 8
Correct Standard Deviation = √0.6947 = 0.8334

UNIT 5
1. Define independent events.
Ans. 1 Two events are said to be independent of each other if the occurrence of one is not
affected by the occurrence of other or the occurrence of one does not affect the occurrence of
the other.
Example: If we roll a die twice, the outcome of the first roll and second roll has no effect on
each other – they are independent.
2.The probability of Mr. Sunil solving a problem is ¾. The probability of Mr. Anish
solving is ¼. What is the probability that a given problem will be solved?
Ans.2
Probability of Mr. Sunil solving a problem P(A) = 3/4
Probability of Mr. Anish solving a problem P(B) = 1/4
P (𝐴̅̅) = 1/4
P (𝐵̅) = 3/4

Required Probability = P (A  B)
= 1 – P (𝐴̅̅) P (𝐵̅)
= 1 – (1/4).(3/4)
= 1 – 3/16
= 13/16

3. The probability that a contractor will get an electrical job is 0.8, he will get a plumbing
job is 0.6 and he will get both 0.48. What is the probability that he gets at least one job?
Are the probabilities of getting electrical job and plumbing job independent?
Ans.3Let,
A: Contractor gets a plumbing contract
B: Contractor gets an electrical contract
Then, P(A) = 0.8 P(B) = 0.6 and P (A  B) = 0.48

Therefore, P [he gets at least one job] =


P (A B) = P(A) +P(B) – P (A B)
X = 0.8 + 0.6 – 0.48 = 0.92

Yes, the probabilities of getting electrical job and plumbing job independent.
4. A box contains 4 red and 5 blue similar rings. What is the probability of selecting at
random two rings:
i. having same colour
ii. having different colours
Ans.4
i. Probability of selecting 2 red rings: 4C2/9C2
4! 4𝑥3
2!(4−2)! 2𝑥1
= = = 6/36 = 1/6
9! 9𝑥8
2!(9−2)! 2𝑥1

Probability of selecting 2 blue rings: 5C2/9C2


5! 5𝑥4
2!(5−2)! 2𝑥1
= = = 10/36 = 5/18
9! 9𝑥8
2!(9−2)! 2𝑥1

P (2 rings of same colour) = P(2red) + P(2blue)


= 1/6 + 5/18 = 4/9

ii. P (2 rings of different colour) = 4C1 .5C1/9C2


5! 𝑥 4!
1!(5−1)! 𝑥 1!(4−1)!
= 9! = 20 / 36 = 5 / 9
2!(9−2)!

5. If P (A  B) = 1/2 and P(B) = 2/3, find P(A/B)?


Ans.5 P(A/B) = P (A  B) / P(B)
𝟏
𝟐 𝟑
= 𝟐 =
𝟒
𝟑

6. The probability that a company A will survive for 20 years is 0.6. The probability that
its sister concern will survive for 20 years is 0.8. What is the probability that at least one
of them will survive for 20 years?
Ans.6
A: Company A will survive for 20 years
B: Company B will survive for 20 years
Then, P(A) = 0.6 P(B) = 0.8 and P (A  B) = 0.48

Therefore, P [he gets at least one job] =


P (A B) = P(A) +P(B) – P (A  B)
X = 0.6 + 0.8 – 0.48 = 0.92

7. A recently developed car has two important components A and B. The probability of
failure of A and B are 0.2 and 0.1. What is the probability that the car will fail?
Ans.7
A: Failure of Component A
B: Failure of Component B
Then, P(A) = 0.2 P(B) = 0.1 and P (A  B) = 0.02

Therefore, P [The car will fail] =


P (A B) = P(A) +P(B) – P (A  B)
X = 0.2 + 0.1 – 0.02 = 0.28

8. The probability that a football player will play on ordinary ground is 0.6 and on green
turf is 0.4. The probability that he will get knee injury when playing an ordinary ground
is 0.07 and that on green turf is 0.04. What is the probability that he got a knee-injury
due to the play on ordinary ground?
Ans.8
P(OG) = 0.6
P(GT) = 0.4
P (OG| Knee Injury) = 0.07
P (GT| Knee Injury) = 0.04
P (Knee Injury |OG) = P(OG)*P (OG| Knee Injury) / P(OG)*P (OG| K Inj.) + P (GT| Knee
Inj.)
= (0.6 * 0.07) / ((0.6*0.07) + (0.4*0.04))
= 42 / 58
= 21 / 29

UNIT 6
1. What are the assumptions under which binomial distribution is applied?
Ans.1 The following are assumptions under which a Binomial distribution can be applied:
i) The outcome of an experiment should be of dichotomous nature. In the Bernoulli process,
there must be only two possible outcomes on each trial, such as ‘success’ or ‘failure’, ‘yes’ or
‘no’, ‘defective’ or ‘not defective’, ‘male’ or ‘female’, ‘pass’ or ‘fail’, ‘favourable’ or
‘unfavourable’, etc. In this experiment, the probability of success is denoted by ‘p’ and
probability of failure is denoted by ‘q’.

ii) The probability of success should remain the same across the experiments. Irrespective of
the number of times the experiment is conducted, the probability of success should be same
for all the trials of the experiment. For example, the probability of getting a head is always 0.5
irrespective of the number of times a fair coin is tossed.

iii) Experiments should be conducted under identical conditions. There should not be any
change in conditions while conducting binomial experiments. Any change in conditions only
leads to incorrect conclusions for the given experiment.

iv) Experiments should be statistically independent. We can apply a Binomial distribution only
when the events in an experiment are statistically independent, which means occurrence of
one event does not affect the occurrence of other event.
2. Find P(X = 2), given mean and standard deviation of the binomial distribution are 4
and 3 respectively.
Ans.2 Given, Mean (np) = 4; npq = 3
𝑣ariance 3
𝑝=1− =1− =1
mean 4 4
q= 3
4
n = 16
The probability mass function (PMF) for a binomial distribution:
P(X = k) = (nCk) . pk . qn-k
For, P(X=2)
P(X=2) = (16C2) . (1/4)2 . (3/4)14
= 16
( C2) . (0.75)2 . (0.25 )14
4. Give real life examples of Poisson variate.
Ans.4 The number of customers arriving at a bank or a service counter in a fixed time.
Assuming customers arrive independently and at a constant average rate, the Poisson
distribution can help estimate the probability of a specific number of arrivals.

5. If the first two terms of a Poisson distribution are 150 and 90, find P (X = 0).
Ans.5 The probability mass function (PMF) of a Poisson distribution is given by:
−𝜆⋅𝜆𝐤
P(X=k) = ⅇ Where X is the random variable, k is the number of events, and 𝜆 is the average
𝐤!
rate of events.
Now, first two terms given are 150 and 90, so
The first term (k=0) gives us:
𝑒−𝝀 ⋅ 𝝀𝟎
𝑷(𝑿 = 𝟎) = = 𝑒−𝝀 = 𝟏𝟓𝟎
𝟎!
The second term (k=1) gives us:
𝑒−𝝀 ⋅ 𝝀𝟏
𝑷(𝑿 = 𝟏) = = 𝝀 . 𝑒−𝝀 = 𝟗𝟎
𝟏!
By dividing both the equations we get,
𝝀 .𝑒 −𝝀 𝟗𝟎
=
𝑒−𝝀 𝟏𝟓𝟎
𝟑
𝝀 = = 𝟎. 𝟔
𝟓
-0.6
𝑃(𝑋 = 0) = 𝑒 = 𝑒 = 0.5488.
−λ

6. The average number of phone calls at a booth per hour is 2. What is the probability
that there will be exactly one call in an hour?
−𝜆⋅𝜆𝐤
Ans.6 Using PMF, P(X=k) = ⅇ . Given, k = 1; 𝜆 = 2
𝐤!
ⅇ−2 ⋅ 21 2ⅇ−2 2 2
(X = 1) = = = 2= = 0.2707
1! 1 ⅇ 7.3891
7. The probability that a firm’s product will succeed its competitor’s product is 2/3. If in
a month it has introduced 4 products, what is the probability that:
i) Two products succeed the competitor’s product?
ii) All products succeed the competitor’s product?
Ans.7
P (Firm’s product will succeed its competitor’s product) = 2/3
P (Firm’s product will not succeed its competitor’s product) = 1/3

i. P (Two Products Succeed) = P (Two successes and two failures) =


8
(4C2) (2/3 )2 (1/3)2 = 6 . 4 . 1 = 24 =
9 9 81 27
ii. P(All Products Succeed) = 2 4 16
( ) =
3 81

8. Mean life of electric bulbs produced by a company is 1500 hours with a standard
deviation of 300 hours. Assuming that the life of bulbs follows normal distribution,
what is the probability that a randomly selected bulb will:
i) Fail within 1200 hours?
ii) Survive between 1350 and 1650 hours?
iii) Survive beyond 1950 hours?
Ans.8 Certainly, let's go through the calculations again:

Given:
Mean ()= 1500 hours
Standard Deviation ()= 300 hours

i) Probability that a Bulb Fails within 1200 Hours:


𝑥 − 𝜇 1200 − 1500
𝑧= = = −1
𝜎 300
The standard normal distribution table value for (Z = -1) is approximately 0.1587.

ii) Probability that a Bulb Survives between 1350 and 1650 Hours:

𝑥 − 𝜇 1350 − 1500
𝑧= = = −0.5
𝜎 300
𝑥 − 𝜇 1650 − 1500
𝑧= = = 0.5
𝜎 300

The standard normal distribution table values for Z1350 and Z1650 are approximately 0.3085 and
0.6915, respectively. The difference between these values is approximately 0.6915 - 0.3085 =
0.3830.

iii) Probability that a Bulb Survives Beyond 1950 Hours:


𝑥−𝜇 1950 − 1500
𝑧= = = 1.5
𝜎 300

The standard normal distribution table value for Z = 1.5 is approximately 0.9332. To find the
probability of surviving beyond 1950 hours, we need to subtract this value from 1:
P (Survive beyond 1950 hours) = 1 - 0.9332 = 0.0668

rite short notes on Normal distribution.


Ans.9 The normal distribution, also known as the Gaussian distribution or bell curve, is a
fundamental concept in statistics and probability theory. It plays a crucial role in various fields,
including science, economics, engineering, and social sciences. The normal distribution is
characterized by its symmetric bell-shaped curve.
The probability density function (pdf) of the normal distribution is given by:
(𝑥−𝜇)2
1
𝑓(𝑥) = 𝑒− 2𝜎2
𝜎√2𝜋
where:
• μ is the mean,
• σ is the standard deviation,
• e is the base of the natural logarithm.

9. The height of students follows Normal distribution. 15% of them have height less
than 150 cm and 10 % have height above 180 cm. Find the mean and standard deviation
of the distribution?
Ans.10
Given:
1. (P (X < 150) = 0.15) (15% have height less than 150 cm)
2. (P (X > 180) = 0.10) (10% have height above 180 cm)

Using Z-scores:

For the first condition:


𝑥−𝜇
𝑃 (X < 150) = 𝑃 (𝑍 < ) = 0.15
𝜎
Using the standard normal distribution table, (Z = -1.04).

For the second condition:


𝑥−𝜇
𝑃 (X > 180) = 𝑃 (𝑍 < ) = 0.10
𝜎

Using the standard normal distribution table, (Z = 1.28).

Using Z-score Formulas:

1. For the first condition:


150 − 𝜇
−1.04 =
𝜎

𝜇 = 150 + 1.04 ⋅ 𝜎
2. For the second condition:
180 − 𝜇
.28 =
𝜎

180 − (150 + 1.04 ⋅ 𝜎)


1.28 =
𝜎

2.32 ⋅ 𝜎 = 30

30
𝜎= = 12.93
2.32

Substitute σ back into the first equation to find μ:

𝜇 = 150 + 1.04 ⋅ 12.93


𝜇 = 162.56

UNIT 7
1. Discuss the errors that arise in statistical survey.
Ans.1 There are four types of error:
1. Sampling errors: The sample results are bound to differ from population results, since
sample is only a small portion of the population. It is also known as inherent error and cannot
be avoided. It is not worth to eliminate them completely. These errors may be due to
the following factors:
• Faulty selection of sample
• Substitution of units to be studied
• Faulty demarcation of sampling units
• Error due to bias in estimation
However, the sampling errors follow random or chance variations and tend to cancel out
each other on averaging.

2. Non-sampling errors: Non-sampling errors are attributed to factors that can be controlled
and eliminated by suitable actions. They are due to the following factors:
• Faulty planning, faulty definitions
• Defective methods of interviewing
• Personal bias of investigator
• Lack of trained and qualified investigators
• Respondents failure to answer.
• Improper coverage
• Compiling errors
• Publication errors
It is worth to eliminate these errors.

3. Biased errors: Biased errors arise in both census and sampling methods. These error occur
due to personal bias of the investigator and the instruments used for measuring. They are also
due to faulty collection of data, respondent’s bias and bias due to non-response. Biased errors
have a tendency to grow with sample size. Therefore, they are also known as cumulative errors.
The magnitude of biased errors is directly proportional to the sample size.

4. Unbiased errors: The errors that are due to over-estimation and under-estimation, such that
they are equal are known as unbiased errors. They are also known as compensatory errors.
They do not increase with sample size.

2. Describe simple random sampling.


Ans.2 Simple random sampling: Under this technique, sample units are drawn in such a way
that each and every unit in the population has an equal and independent chance of being
included in the sample. If a sample unit is replaced before drawing the next unit, then it is
known as simple random sampling with replacement [SRSWR]. If the sample unit is not
replaced before drawing the next unit, then it is called simple random sampling without
replacement [SRSWOR]. In first case, probability of drawing a unit is 1/N, where N is the
population size. In the second case, probability of drawing a unit is 1/N n. The selection of
simple random sampling can be done by the following ways:

• Lottery method – In lottery method, we identify each and every unit with distinct numbers
by allotting an identical card. The cards are put in a drum and thoroughly shuffled before each
unit is drawn. The figure depicts a lotto machine through which samples can be selected
randomly.

• The use of table of random numbers – There are several random number tables. They are
Tippet’s random number table, Fisher’s and Yate’s tables, Kendall and Babington Smiths
random tables, Rand Corporation random numbers etc. The table depicts the specimen of
random numbers by Tippett’s.

3. Describe systematic sampling.


Ans.3 This design is recommended if we have a complete list of sampling units arranged in
some systematic order such as geographical, chronological or alphabetical order.
Suppose the population size is ‘N’. The population units are serially numbered ‘1’ to ‘N’ in
some systematic order and we wish to draw a sample of ‘n’ units. Then we divide units from
‘1’ to ‘N’ into ‘K’ groups such that each group has ‘n’ units.
This implies ‘nK = N’ or ‘K = N/n.’ From the first group, we select a unit at random. Suppose
the unit selected is 6th unit, thereafter we select every 6 + Kth units. If ‘K’ is 20, ‘n’ is 5 and
‘N’ is 100 then units selected are 6, 26, 46, 66, 86.
4. What is quota sampling and when do we use it?
Ans.4 It is a type of judgment sampling. Under this design, quotas are set up according to
some specified characteristic such as age groups or income groups. From each group a
specified number of units are sampled according to the quota allotted to the group. Within the
group the selection of sample units depends on personal judgment. It has a risk of personal
prejudice and bias entering the process. This method is often used in public opinion studies.

5. What are the basic principles on which sampling theory is based?


Ans.5 The five important laws of sampling theory, as follows:
1. Law of statistical regularity: The law of statistical regularity states that a group of units
chosen at random from a large group tends to possess the characteristics of that large group.
Suppose a particular characteristic of the population has a particular shape, then the same
characteristics will also follow the same shape in the sample.

2. Principle of inertia of large numbers: This principle states that “other things being equal,
as the sample size increases, the results tend to be more reliable and accurate”. Suppose that
the population mean is 25 units, if a sample size of 50 results in average of 24.5 units, then
larger sample size of 100 will result in 24.8 units. In other words, larger the sample size, more
accurate will be the result.

3. Principle of persistence of small numbers: If some of the units in a population possess


markedly distinct characteristics, then it will be reflected in the sample values also. For
example, if there are 300 blind persons in a population of 10,000 persons, then a sample of
hundred will have more or less same proportion of blind persons in it.
4. Principle of validity: A sampling design is said to be valid if it enables us to obtain tests
and estimation about population parameters.

5. Principle of optimisation: This principle aims at obtaining a desired level of efficiency at


minimum cost or obtaining maximum possible efficiency with given level of cost.

6. Explain about the sampling distributions of a static and its standard error.
Ans.6
Sampling Distribution of a Statistic: When we collect a sample from a population and
calculate a statistic (e.g., mean, proportion, standard deviation) based on that sample, the value
we obtain is just one possible realization of that statistic. If we were to repeat this process with
multiple samples from the same population and calculate the statistic for each sample, we
would create a distribution of those statistics. This distribution is called the sampling
distribution of the statistic.
For example, if we are interested in the sample mean (𝑥̅), the sampling distribution of the
sample mean would show all possible values of 𝑥̅that could be obtained from different samples
of the same size.

Standard Error of a Statistic: The standard error (SE) of a statistic measures the variability
of the sampling distribution of that statistic. It provides an estimate of how much the sample
statistic is expected to vary from the true population parameter.
For the sample mean (𝑥̅), the standard error (often denoted as SE ( 𝑥̅) or σ𝜎𝑥̅) is calculated as
the standard deviation of the population divided by the square root of the sample size:
SE (𝑥̅) = 𝜎
√𝑛
Where:
• σ is the population standard deviation.
• n is the sample size.
This formula indicates that larger sample sizes result in smaller standard errors, meaning that
the sample mean is more likely to be close to the population mean.

7. The distribution of employees in three plants of a manufacturing unit is depicted in


table 7.8. Using random numbers discussed under topic ‘simple random sampling ‘draw
a random sample of size 15.

Table 7.8: Distribution of Employees in Three Manufacturing Plants

Plant 0-5 5-10 10-15


Number of Employees 4 6 10

Ans.7 Let's use the given distribution of employees in three manufacturing plants and draw a
random sample of size 15 using random numbers.
Distribution:
• Plant 0-5: 4 employees
• Plant 5-10: 6 employees
• Plant 10-15: 10 employees

Step 1: Assign Random Numbers:


• Plant 0-5: Assign random numbers 1 to 4.
• Plant 5-10: Assign random numbers 5 to 10.
• Plant 10-15: Assign random numbers 11 to 20.
Step 2: Generate Random Numbers:
Let us generate 15 random numbers between 1 and 20 (inclusive).
Generated random numbers: 3, 8, 15, 6, 11, 2, 5, 14, 7, 10, 1, 19, 4, 9, 12.
Step 3: Assign Employees to Categories:
• Random numbers 3, 2, 1 belong to the 0-5 category.
• Random numbers 8, 6, 11, 5, 10, 9 belong to the 5-10 category.
• Random numbers 15, 19, 12 belong to the 10-15 category.
Random Sample of Size 15:
• 0-5 category: 3 employees
• 5-10 category: 6 employees
• 10-15 category: 6 employees

So, based on the random numbers generated, you would select 3 employees from the 0-5
category, 6 employees from the 5-10 category, and 6 employees from the 10-15 category,
resulting in a total random sample of size15.

8. Population proportion of tea drinkers is 0.6. Determine the sample size such that the
error between actual and observed proportion will be less than or equal to 0.05 with 95
% confidence, (Z = 1.96).
Ans.8
Given:
• Z=1.96 (for a 95% confidence level)
• p = 0.6 (population proportion)
• E=0.05 (margin of error)

𝑧2 × 𝑝 × (1 − 𝑝)
𝑛=
𝐸2
(1.96)2 × 0.6 × (1 − 0.6)
𝑛=
(0.05)2

3.841 × 0.6 × (1 − 0.6) 0.9216


𝑛= = = 368.64 ≈ 369
0.0025 0.0025
9. The standard error of mean of bursting strength of card boards produced by a
company is 1.5 units. If the population standard deviation is √𝟓𝟎, find the sample size.
Ans.9
Given:
• 𝜎 = √50 (population proportion)
• SE=1.5 (margin of error)
2
𝜎 2 √50 50
𝑛=( ) = ( ) = = 22.2222 ≈ 23
𝑆𝐸 1.5 2.25

UNIT8

1. XYZ bank is determining the number of tellers available during the Friday lunch rush
hour. The bank has collected data on the number of people who entered the bank during
the past three months, on Fridays from 11 am to 1 pm. Using the data from table 8.6,
find the point estimates of the mean and standard deviation of the population from which
the sample was drawn.

Table 8.6: Data of the Number of People entered into XYZ Bank
242 275 289 306 342 385
279 245 269 305 294 328

Ans.1
∑𝑥𝑖 242+275+289+306+342+385+279+245+269+305+294+328
MEAN (𝑥̅) = = Mean =
𝑛 12
296.58

Standard deviation(s) = (Variance)2 = s2

∑(𝑥𝑖−𝑥̅)2 (242−296.583)2+(275−296.583)2+⋯+(328−296.583)2 12208.604


s2 = = =
𝑛−1 11 11

s2 = 1109.873
Standard deviation(s) = √1109.873 = 40.751
2. From a population known to have standard deviation of 1.4, a sample of 60 individuals
is taken. The mean of this sample is found to be 6.2.
i) Find the standard error of the mean.
ii) Establish an interval estimate around the sample mean using one standard deviation
of the mean.
Ans.2
Given:
- Population standard deviation (𝜎) = 1.4
- Sample size (n) = 60
- Sample mean (𝑥̅) = 6.2
i) Standard Error of the Mean (SEM):
1.4
SEM = 𝜎 = 1.4 = = 0.181
√𝑛 √60 7.7

ii) Interval Estimate using One Standard Deviation:


Interval Estimate = 𝑥̅ ± SEM = 6.2 ± 0.181
This yields the interval estimate (6.019, 6.381)

3. On collecting a sample of 250 from a population with a known standard deviation of


13.7, the mean is found to be 112.4.
i) Find a 95% confidence level interval for the mean.
ii) Find a 99% confidence level interval for the mean
𝜎
Ans.3 Confidence Interval = 𝐱̅± 𝐙 ( )
√𝐧
Given:
• Sample mean (x ̅) = 112.4
• Population standard deviation (σ) = 13.7
• Sample size (n) = 250

i. For a 95% confidence level, the Z-score (Z) is approximately 1.96.


13.7
Confidence Interval95% = 112.4 ± 1.96 ( ) = 112.4 ± 1.697
√250
95% Confidence Interval: (110.703,114.097)

ii. For a 99% confidence level, the Z-score (Z) is approximately 2.576.
13.7
Confidence Interval99% = 112.4 ± 2.576 ( ) = 112.4 ± 2.234
√250
99% Confidence Interval: (110.166,114.634)

UNIT 9

1. Twenty households out of 1000 were using Brand ‘A’ toothpaste. The company
increased the price of the brand. In a survey, they found that only 12 households out of
1000 are using it now. Can we conclude at 5% level of significance that proportion of
users has decreased?
Ans.1.
Given:
- p1 = 20 (proportion before),
1000
- p2 = 12
(proportion after),
1000
- n1 = n2 = 1000 (sample sizes),
- Significance level 𝛼 = 0.05

Pooled Sample Proportion:


𝑥 +𝑥
𝑝= 1 2
𝑛1+𝑛2
20+12 32
𝑝= = = 0.016
1000+1000 2000

Standard Error (SE):


1 1
𝑆𝐸 = √𝑝(1 − 𝑝) ( + )
𝑛1 𝑛2
1 1
𝑆𝐸 = √0.016(1 − 0,016) ( + )
𝑛1 𝑛2

𝑆𝐸 = √0. 016 × 0.984 × 0.002


𝑆𝐸 ≈ 0.0056

z-test statistic:
𝑝 −𝑝
𝑧= 1 2
𝑆𝐸

0.02−0.012 0.008
𝑧= =
0.0056 0.0056

𝑧 ≈ 1.428

Compare with the critical value:


The critical value for a one-tailed test at a 5% level of significance is approximately 1.645.
Since 1.428 < 1.645, we do not reject the null hypothesis at a 5% significance level, h0
accepted.

2. A drill drills holes with standard deviation of depth 0.03cms. It is adjusted to drill holes
of depth 5.5cm. For 50 holes drilled, the mean depth is 5.503cm. Test at 5% level of
significance whether the adjustment is correct.
Ans.2
Given:
- Population standard deviation (𝜎) = 0.03 cm,
- Sample mean (x̅) = 5.503 cm,
- Sample size (n) = 50,
- Hypothesized population mean (𝜇) = 5.5 cm,
- Significance level 𝛼 = 0.05.
z-test statistic:
[𝑥̅ − 𝜇] [5.503 − 5.5] 0.003
𝑍= 𝜎 = = = 0.7075
0.03 0.00424
√𝑛 √50
Compare with the critical value:
The critical value for a one-tailed test at a 5% level of significance is approximately 1.96.
Since 0.7075 < 1.96 we do not reject the null hypothesis at a 5% significance level, h0 accepted.

3. Out of 80 batteries produced by a process I, three were found to be defective. In another


sample of 130 produced by process II, two were found to be defective. Test whether the
proportion of defectives in two processes differs, using 1% level of significance.
Ans.3
Given:
• n1 = 80
• n2 = 130
• x1 = 3
• x2 = 2
• Significance level 𝛼 = 0.01

Pooled Sample Proportion:


p1 = 3 = 0.0375
80
p2 = 2 = 0.0154
130

𝑥1+𝑥2
𝑝=
𝑛1+𝑛2
3+2 5
𝑝= = = 0.0238
80+130 210

Standard Error (SE):


1 1
𝑆𝐸 = √𝑝(1 − 𝑝) ( + )
𝑛1 𝑛2
1 1
𝑆𝐸 = √0.0238(1 − 0.0238) ( + )
80 130

𝑆𝐸 = √0. 00016238
𝑆𝐸 ≈ 0.01274

z-test statistic:
𝑝 −𝑝
𝑧= 1 2
𝑆𝐸

0.0375−0.0154 0.0221
𝑧= =
0.01274 0.01274

𝑧 ≈ 1.731

Compare with the critical value:


The critical value for a one-tailed test at a 5% level of significance is approximately 2.576.
Since 1.731 < 2.576, we do not reject the null hypothesis at a 5% significance level, h0
accepted.
4. The table 9.8 depicts the data related to mean weight of a product. Test whether there
is a significant difference in means of the plants.

Table 9.8: Mean Weight of a Product

Plant A Plant B
Size 300 200
Mean 75.4 74.3
Ans.4 Given : Variance 65.6 57.8
• NA = 300
• NB = 200
• 𝑋̅A = 75.4
• 𝑋̅B = 74.3
Standard Deviation = √𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒
∴ 𝜎A2 = 65.6; 𝜎 B2 = 57.8

z-test statistic: -

𝑋̅𝐴̅ −𝑋̅𝛽 75.4 −74.3 1.1 1.1


ZCal = =
𝜎2 65.6
=
57.8
=
𝜎2 + √ √0.128+0.209 0.712
√ +
𝑁𝐴̅ 𝑁𝐵 300 200
ZCal = 1.54
The critical value for a one-tailed test at a 5% level of significance is approximately 1.96.
Since 1.54 < 1.96, we do not reject the null hypothesis at a 5% significance level,h0 accepted.

5. A machine is set to produce particular characteristics with mean 21.3 and S.D 0.4. A
random sample of 625 observations has 21.33 as mean. Test whether the sample mean
differ significantly from population mean.
Ans.5 Given:
- Population standard deviation (𝜎) = 0.4 cm,
- Sample mean (x̅) = 21.33 cm,
- Sample size (n) = 625,
- Hypothesized population mean (𝜇) = 21.3 cm,
- Significance level 𝛼 = 0.05.
[𝑥̅−𝜇] [21.33−31.3] 0.75
z-test statistic: 𝑍 = 𝜎 = 0.4 = = 1.875
0.4
√𝑛 √625
Compare with the critical value:

The critical value for a one-tailed test at a 5% level of significance is approximately 1.96.
Since 1.875 < 1.96 we do not reject the null hypothesis at a 5% significance level, h0 accepted.
6. Out 10,000 pumpkins harvested, 1000 were randomly selected. 8% were found to be
rotten. The grower claims that only 7% are rotten. In this claim tenable? Test at 5%
level of significance.
Ans.6 Given:
• p1 = 0.08 (proportion of rotten pumpkins in the sample),
• p2 = 0.07 (claimed proportion by the grower),
• n =1000 (sample size)
z-test statistic:
p1 − p2 0.08 − 0.07 0.01 0.01 0.01
z= = = = = = 1.238
p (1 − p ) 0.07(1 − 0.07) 0.0651 √0.0000651 0.00807
√ 2 2 √ √
n 1000 1000
The critical value for a one-tailed test at a 5% level of significance is approximately 1.96.
Since 1.238 < 1.96 we do not reject the null hypothesis at a 5% significance level, h0 accepted.
7. A group of seven–week–old chickens reared on a high protein diet weigh 12, 15, 11, 16,
14, 14 and 16 ounces. In another group, 5 chicken received low protein diet and weigh 8,
10, 14, 10, and 13. Test whether there is significant increase in weight due to high protein,
use 5% level of significance.
Ans.7
• 𝑋 ̅1 = 14 (Calculate mean)
• s1 = 1.92 (calculate standard deviation)
• n1=7
• 𝑋 ̅2 =11
• s2 = 2.45
• n2=5
t-test :

(̅X1 − ̅X2) (14 − 11) 3 3


t= = = = ≈ 2.28
2 2 √3.6864 6.0025 √0. 5274 + 1.2005 √1.7279
√ s1 + s2 7 + 5
n1 n2
The critical t-value for degree of freedom 10 is approximately ±2.228. Since 2.28 is greater
than 2.228, we reject the null hypothesis.

8. Table depicts the strength test results of two yarns. Is there a significant difference in
the mean? Test at 5% level of significance.
Table 9.9: Strength Results of the Two Yarns
Sample Size Mean Sample Variance
Type A 4 52 42
Ans.8
Type B 9 42 56
Given data:
• For Type A:
• n1 = 4
• 𝑋̅1 = 52
• s12 = 42
• For Type B:
• n2 = 9
• 𝑋 ̅2 = 42
• s22 = 56
• t-test:

(̅X1 − ̅X2) (52 − 42) = 3 3


= ≈ 10 ≈ 2.44
t= = 42 56 √10.5 + 6.22 √16.72 4.09
2 2 √ +
√ s1 + s2 4 9
n1 n2

The critical t-value for degree of freedom 11 is approximately ±2.201. Since 2.44 is greater
than 2.228, we reject the null hypothesis.

9. The table 9.10 depicts the results related to the memory capacity of 10 students before
and after training. Test at 5% level of significance whether training is effective.
Table 9.10: Memory Capacity of 10 Students

Roll no. 1 2 3 4 5 6 7 8 9 1
Before Training 1 14 11 8 7 1 3 0 5 6
After Training 1 16 10 7 5 1 10 2 3 8
Ans.9
Solution:

Before Training After Training


Difference (di) = x2 – x1 di2
(x1) (x2)
1 1 0 0
14 16 2 4
11 10 -1 1
8 7 -1 1
7 5 -2 4
1 1 0 0
3 10 7 49
0 2 2 4
5 3 -2 4
6 8 2 4
𝜮𝒅𝒊 = 𝟕 ∑𝒅𝟐 = 71
𝒊

n = 10, n - 1 = 9
̅ = Σ ⅆⅈ =
Mean Difference = D
7
= 0.7
n 10

2
∑ⅆⅈ −(Σⅆi) 2 71 −(7)2 22
Sd = √ = √ = √ = √2.44 = 1.54
n−1 9 9
̅
D
t-test: t = =
0.7
=
0.7×√10
=
2.21
= 1.43
Sd 1.54 1⋅54 1.54
√n √10

The critical t-value for degree of freedom 9 is approximately ±2.262. Since 1.43 is less than
2.262, we accept the null hypothesis.

UNIT 10
1.400 items of each (material) were given treatment ‘x’ and ‘y’ to enhance the strength
of the material. 80 gained strength by treatment ‘x’ and 20 gained strength by treatment
‘y’. Does the gain in strength depend on the treatment?
Ans.1
Given Data:
• Total items: 400
• Items gaining strength by treatment 'x': 80
• Items gaining strength by treatment 'y': 20

Observation Table:
Treatment ‘x’ Treatment ‘y’ Total
Gained Strength 80 20 100
Not Gained Strength 320 380 700
Total 400 400 800

Row Total ×Column Total


Expected Frequency = Granⅆ Total
100 × 400
𝐸11 = = 50
800
100 × 400
𝐸12 = = 50
800
700 × 400
𝐸21 = = 350
800
700 × 400
𝐸22 = = 350
800
Expected Frequency Table:
Treatment ‘x’ Treatment ‘y’ Total
Gained Strength 50 50 100
Not Gained Strength 350 350 700
Total 400 400 800

Chi-Square Statistic:
(𝑂𝑖 − 𝐸𝑖)2
𝜒2 = ∑
𝐸𝑖

(80 − 50)2 (20 − 50)2 (320 − 350)2 (380 − 350)2


𝜒2 = + + +
50 50 350 350

900 900 900 900 14400


𝜒2 = + + + = = 41.14
50 50 350 350 350
Degrees of Freedom:
df = (Number of Rows−1) × (Number of Columns−1) = (2−1) × (2−1) =1

The critical chi-square value for degree of freedom 1 is approximately 3.841. Since χ2 (41.14)
is greater than the critical value (3.841), we reject the null hypothesis.

2. The demand for a particular spare part was found to vary from day to day. Table 10.6
depicts the information obtained in a sample study. Test the hypothesis that the number
demanded depends upon the day.
Table 10.6: Spare Part Demand from Monday to Saturday
Days Mon Tue Wed Thur Fri Sat
Quantity Demanded 1124 1125 1110 1120 1126 1115
Ans.2 Given Data: Table 10.6

Row Total ×Column Total


Expected Frequency = Granⅆ Total
6700 × 1124
𝐸𝑀𝑜𝑛 = = 1124
6700
Expected Frequency Table:

Days Mon Tue Wed Thur Fri Sat Total


Quantity Demanded 1124 1125 1110 1120 1126 1115 6700
Quantity Expected 1124 1124 1124 1124 1124 1124 6700

Chi-Square Statistic:
(𝑂𝑖 − 𝐸𝑖)2
𝜒2 = ∑
𝐸𝑖

(1124 − 1124)2 (1125 − 1124)2 (1110 − 1124)2 (1120 − 1124)2


𝜒2 = + + +
1124 1124 1124 1124
(1126 − 1124)2
(1115 − 1124)2
+ +
1124 1124
0 1 196 16 4 81 298
𝜒2 = + + + + + = = 0.265
1124 1124 1124 1124 1124 1124 1124
Degrees of Freedom:
df = (Number of Categories −1) = 6 - 1 = 5

The critical chi-square value for degree of freedom 5 is approximately 11.070. Since χ2 (0.265)
is less than the critical value (11.070), we accept the null hypothesis.

3. In a survey of 200 boys, of which 75 were intelligent, 40 had skilled fathers. While 85
of the unintelligent boys had unskilled fathers. Can we say on the basis of the information
that skilled fathers had intelligent boys?
Ans.3 Given:
Observed Frequency:
Skilled father Unskilled father Total
Intelligent Boys 40 75 – 40 = 35 75
Unintelligent boys 125 - 85 = 40 85 200 – 75 = 125
Total 40 + 40 = 80 35 + 85 = 120 200

Row Total ×Column Total


Expected Frequency =
Granⅆ Total
75 × 80
𝐸11 = = 30
200
125 × 80
𝐸12 = = 50
200
75 × 120
𝐸21 = = 45
200
125 × 120
𝐸22 = = 75
200
Chi-Square Statistic:
(𝑂𝑖 − 𝐸𝑖)2
𝜒2 = ∑
𝐸𝑖

(40 − 30)2 (35 − 45)2 (40 − 50)2 (85 − 75)2


𝜒2 = + + +
30 45 50 75

100 100 100 100 80


𝜒2 = + + + = = 8.888
30 45 50 75 9

Degrees of Freedom:
df = (Number of Rows−1) × (Number of Columns−1) = (2−1) × (2−1) =1

The critical chi-square value for degree of freedom 1 is approximately 3.841. Since χ2 (8.888)
is greater than the critical value (3.841), we reject the null hypothesis.
4. The number of car accidents per month in a town was as follows: 6, 9, 4, 12, 8, 20, 14,
15, 2, and 10. Test the hypothesis that the number of accidents is same every month.
Ans.4
Given the total number of accidents over the 10 months are: 6 + 9 + 4 + 12 + 8 + 20 + 14 + 15
+ 2 + 10 = 100.

Under the null hypothesis, these accidents should be uniformly distributed over the 10 months
period and hence the expected number of accidents for each of the 10 months are 100/10 = 10.

Months Observed No. of Expected No. of (fo - fe) (fo - fe)2 (𝒇𝟎 − 𝒇𝑒)𝟐
𝝌𝟐 =
accidents (fo) accidents (fe) 𝒇𝑒
1 6 10 -4 16 1.6
2 9 10 -1 1 0.1
3 4 10 -6 36 3.6
4 12 10 2 4 0.4
5 8 10 -2 4 0.4
6 20 10 10 100 10.0
7 14 10 4 16 1.6
8 15 10 5 25 2.5
9 2 10 -8 64 6.4
10 10 10 0 0 0.0
Total 100 100 26.6

(𝒇𝟎 − 𝒇𝑒)𝟐
𝝌𝟐 =∑ = 𝟐𝟔. 𝟔
𝒇𝑒

Degrees of Freedom:
df = (Number of Categories −1) = 10 - 1 = 9

The critical chi-square value for degree of freedom 9 is approximately 21.666. Since χ2 (26.6)
is greater than the critical value (21.666), we reject the null hypothesis.

5. In a particular industry the post graduate, graduate, undergraduates are in the ratio
2:3:5. A firm belonging to the industry had 400, 550 and 1050 postgraduates, graduates
and undergraduates on its pay-roll. Do they follow earlier observation about the
industry?
Ans.5 Given Data:
• Observed frequencies in the firm: 400 postgraduates, 550 graduates, 1050
undergraduates
• Expected ratio in the industry: 2:3:5
Expected Frequencies in the Firm (Based on Industry Ratio):
• Total observed count in the firm: 400+550+1050=2000
2
• Expected count for postgraduates: × 2000= 400
3 10
• Expected count for graduates: × 2000 = 600
10
5
• Expected count for undergraduates: × 2000 = 1000
10

(𝑂𝑖 − 𝐸𝑖)2
𝜒2 = ∑
𝐸𝑖

(400 − 400)2 (550 − 600)2 (1050 − 1000)2


𝜒2 = + +
400 600 1000
2500 2500 40
𝜒2 = 0 + + = = 6.667
600 1000 6

df = (Number of Categories −1) = 3 - 1 = 2.

The critical chi-square value for degree of freedom 2 is approximately 5.991. Since χ2 (6.667)
is greater than the critical value (5.991), we reject the null hypothesis.

6. Three hundred digits were chosen at random from a set of tables. The frequencies of
the digits were as follows:

Digits 0 1 2 3 4 5 6 7 8 9
Frequency 28 29 33 31 26 35 32 30 31 25

Using Chi-square test assess the hypothesis that the digits were distributed in equal
numbers in the table.
Ans.6 Given the total number of accidents over the 10 months are: 6 + 9 + 4 + 12 + 8 + 20 +
14 + 15 + 2 + 10 = 100.

Under the null hypothesis, the expected number of frequency for each of the 10 digits are
300/10 = 30.

Digits Observed Expected (fo - fe) (fo - fe)2 (𝒇𝟎 − 𝒇𝑒)𝟐


𝝌𝟐 =
Frequency (fo) Frequency (fe) 𝒇𝑒
0 28 30 -2 4 0.13
1 29 30 -1 1 0.03
2 33 30 3 9 0.3
3 31 30 1 1 0.03
4 26 30 -4 16 0.53
5 35 30 5 25 0.83
6 32 30 2 4 0.13
7 30 30 0 0 0
8 31 30 1 1 0.03
9 25 30 -5 25 0.83
Total 300 300 2.84

(𝒇𝟎 − 𝒇𝑒)𝟐
𝝌𝟐 =∑ = 𝟐. 𝟖𝟒
𝒇𝑒
Degrees of Freedom:
df = (Number of Categories −1) = 10 - 1 = 9

The critical chi-square value for degree of freedom 9 is approximately 21.666. Since χ2 (2.84)
is less than the critical value (21.666), we accept the null hypothesis.

UNIT 11
1. Table 11.8 depicts the data of the number of claims processed per day of a group of
four employees of XYZ Insurance Company observed for a number of days. Test the
hypothesis that the employees mean claims per day are all the same. Use 5% level of
significance.
Table 11.8: Claims Processed per Day of Four Employees of an XYZ
Insurance Company
Employee 1 15 17 14 12
Employee 2 12 10 13 17
Employee 3 11 14 13 15 12
Employee 4 13 12 12 14 10 9
Ans.1 Given Data:
Claims processed per day for four employees of XYZ Insurance Company.
Employee 1: 15 17 14 12
Employee 2: 12 10 13 17
Employee 3: 11 14 13 15 12
Employee 4: 13 12 12 14 10 9

Group Mean:
Mean (Employee 1) = (15 + 17 + 14 + 12) / 4 = 14.5
Mean (Employee 2) = (12 + 10 + 13 + 17) / 4 = 13
Mean (Employee 3) = (11 + 14 + 13 + 15 + 12) / 5 = 13
Mean (Employee 4) = (13 + 12 + 12 + 14 + 10 + 9) / 6 = 12.5

Grand Mean = (14.5 + 13 + 13 + 12.5) / 4 = 13.25

Between-Group Sum of Squares (SSB):


SSB = 4 * ((14.5 - 13.25)2 + (13 - 13.25)2 + (13 - 13.25) 2 + (12.5 - 13.25) 2) = 4 * 2.25 = 9

Within-Group Sum of Squares (SSW):


SSW = (15 - 14.5) 2 + (17 - 14.5) 2 + ........ + (9 - 12.5) 2 = 70.25

Degrees of Freedom (DF):


DF_Between = 4 - 1 = 3
DF_Within = 19 (total observations) - 4 (number of groups) = 15

Mean Squares (MS):


MS_Between = SSB / DF_Between = 9 / 3 = 3
MS_Within = SSW / DF_Within = 70.25 / 15 ≈ 4.6833

F-statistic:
F = MS_Between / MS_Within = 3 / 4.4833 ≈ 0.6696

At a 5% level of significance with DF_Between=3 and DF_Within=15, the critical F-value is


approximately 3.24.
Since 0.6696<3.240. we fail to reject the null hypothesis. Therefore, it’s not significant.

2. Four makes of bulbs were tested for their length of life (in ‘000 hours) and the data
obtained is depicted in table 11.9. Test whether the length of their life is significantly
different.
Table 11.9: Four Different Makes of Bulbs with Their Length of Life
MAKE I MAKE II MAKE III MAKE IV
20 19 21 15
23 15 19 17
18 17 20 16
17 20 17 18
16 16
Ans.2
T = Sum of Observations
2
= 324
2
Correction Factor = T = 324 = 5832
N 18
𝟐
SST (Total Sum of the Squares) = Sum of squares of all observations - 𝐓 = (202 + 232 +
𝐍
182 + 172 + ……. 162 + 182) – 5832 = 5914 – 5832 = 82
2
(𝛴𝑋1)2 (𝛴𝑋2)2 (𝛴𝑋3)2 (𝛴𝑋4)2 782 872 932 + 66 – 5832
SSC = [ + + + ] − 5832 = + +
𝑛1 𝑛2 𝑛3 𝑛4 4 5 5 4
SSC = (1521+1513.8+1729.8+1089) - 5832 = 21.6

Sum of the squares of the Error within columns (samples):


SSE = SST – SSC = 82 – 21.6 = 60.4
21.6 21⋅6
Variance between samples: MSC = 𝑆𝑆𝐶
= (4−1) = = 7.2
(𝑘−1) 3
(k is the number of columns and n is the total number of observations)
60.4
Variance within the samples: MSE = 𝑆𝑆𝐸 = 60.4 = = 4.31
(𝑛−𝑘) (18−4) 14
ANOVA TABLE:
Source of Variation Sum of Squares Degree Of Freedom Mean Square
Between Samples SSC = 21.6 k–1=4–1=3 MSC = 7.2
Within Samples SSE = 60.4 n – k = 18 – 4 = 14 MSE = 4.31
Total SST = 82 n – 1 = 18 – 1 = 17

F-statistic:
𝑀𝑆𝐶 7.2
FCal = = = 1.67
𝑀𝑆𝐸 4.31

The table value of ‘F,’ at 5% level of significance for (3,14) degrees of freedom (df), is 3.34.
Since 1.67<3.34. we fail to reject the null hypothesis. Therefore, it is not significant.

3. Table 11.10 depicts the data on production rate by five workmen on four machines.
Test whether the rate is significantly different due to workers and machines.
Table 11.10: Production Rate of Five Workmen on Four Machines
Workmen
Machines
I II III IV V
1 46 48 36 35 40
2 40 42 38 40 44
3 49 54 46 48 51
4 38 45 34 35 41
Ans. N = 20, T = Sum of all values = 850
2 2
Correction Factor = T = 50 =0
N 20 𝟐
SST (Total Sum of the Squares) = Sum of squares of all observations - 𝐓 = (462 + 402 +
𝐍
492 + (38)2 + ……. 512 + 412) – 36125 = 629
SSC (between workmen): 2
(∑𝑋1𝑖)2 (∑𝑋2𝑖)2 (∑𝑥𝑛𝑖)2 𝑇2 1732 1892 1542 + 176 – 0 = 201.5
SSC = [ + + ⋯+ ]− = + +
𝑛1 𝑛2 𝑛𝑛 𝑛 4 4 4 4

Degrees of freedom = (c-1) = (5 -1) = 4


MSC = 𝑆𝑆𝐶 = 201.5 = 201.5 = 50.37
(𝑐−1) (5−1) 4

2
(∑𝑋1𝑗)2 (∑𝑋2𝑗)2 (∑𝑥𝑛𝑗)2 𝑇2 2052 2042 2482 + 193 – 36125 = 353.8
SSR = [ + + ⋯+ ]− = + +
𝑛1 𝑛2 𝑛𝑛 𝑛 5 5 5 5

Degrees of freedom = (r-1) = (4 -1) = 3


MSR = 𝑆𝑆𝑅 = 353.8 = 353.8 = 117.9
(𝑟−1) (4−1) 3
SS residual or error: SSE = SST – SSC – SSR = 629 – 200.5 – 353.8 = 74.7
74.7
𝑆𝑆𝐸
MSE = (𝑟−1)(𝑐−1) = = 6.22
12

𝑀𝑆𝐶
Since MSC > MSE we take 𝐹 = and MSR > MSE we take 𝐹 = 𝑀𝑆𝑅
𝑐 𝑀𝑆𝐸 𝑟 𝑀𝑆𝐸

𝑀𝑆𝐶 50.12
𝐹𝑐 = = = 8.08
𝑀𝑆𝐸 6.22
𝑀𝑆𝑅 117.9
𝐹𝑟 = = = 19.01
𝑀𝑆𝐸 6.22
For Workmen:
The calculated value of 𝐹𝑐 is 8.08. The table value of F for (4,12) df at 5% level of significance
is 3.26. Since the calculated value of F is greater than the table value, we rejectt the null
hypothesis and conclude that it is significant.
For Machine:
The calculated value of 𝐹𝑟 is 19.01. The table value of F for (3,12) df at 5% level of significance
is 3.49.Since the calculated value of F is greater than the table value, we rejectt the null
hypothesis and conclude that it is significant.

4. The percentage of sugar content of tobacco in two samples is depicted in table 11.11.
Test whether their population variances are same.
Table 11.11: Percentage of Sugar Content of Tobacco in Two Samples
Sample A 2.4 2.7 2.6 2.1 2.5
Sample B 2.7 3.0 2.8 3.1 2.2 3.6
Ans.4
Sample A:
2.4 + 2.7 + 2.6 + 2.1 + 2.5
𝑥1 = = 2.46
5
1
𝑆12 = 𝛴(𝑥𝑖 − 𝑥̅)2
4 1
= [ (2.4 − 2.46)2 + (2.7 − 2.46)2 + (2.6 − 2.46)2 + (2.1 − 2.46)2
4
+ (2.5 − 2.46)2 ] = 0.0525

Sample B:
2.7 + 3 + 2.8 + 3.1 + 2.2 + 3.6
𝑦̅1 = = 2.9
6
1 2
2
𝑆1 (𝑦 − 𝑦̅)
= 𝛴 𝑖
4 1
= [ (2.7 − 2.9)2 + (3 − 2.9)2 + (2.8 − 2.9)2 + (3.1 − 2.9)2
5
+ (2.2 − 2.9)2 + (3.6 − 2.9)2 ] = 0.216
F-statistic:
𝑠12 0.0525
= ≈ 0.24
𝑠22 0.216
Degrees of Freedom:
df1 = 4-1 = 3
df2 = 6-1 = 5

The table value of ‘F,’ at 5% level of significance for (3,5) degrees of freedom (df), is 6.607.
Since 0.24<6.607. we fail to reject the null hypothesis. Therefore, it is not significant.

5. Three students determine the moisture content of samples of a powder, each student
taking a sample from each of 4 consignments. The results are given below:

Consignment
Students
I II III IV
1 9 10 9 10
2 12 12 10 11
3 11 11 9 12

Ans.5 N = 12, T = Sum of all values = 126


2 2
Correction Factor = T = 126 = 1323
N 12 𝟐
SST (Total Sum of the Squares) = Sum of squares of all observations - 𝐓 = (92 + 102 + 92
𝐍
+ 102 + ……. 92 + 122) – 1323 = 1338 – 1323 = 15
SSC (between consignment): 2
(∑𝑋1𝑖)2 (∑𝑋2𝑖)2 (∑𝑥𝑛𝑖)2 𝑇2 322 332 282 + 33 – 1323
SSC = [ + + ⋯+ ]− = + +
𝑛1 𝑛2 𝑛𝑛 𝑛 3 3 3 3
= 341.33+363+261.33+363 – 1323 = 5.66
Degrees of freedom = (c-1) = (4 -1) = 3
5.66
MSC = 𝑆𝑆𝐶 = 5.66 = = 1.88
(𝑐−1) ( ) 4−1 3

SSR (between students):


(∑𝑋1𝑗)2 (∑𝑋2𝑗)2 (∑𝑥𝑛𝑗)2 𝑇2 382 452 432
SSR = [ + + ⋯+ ]− = + + – 1323
𝑛1 𝑛2 𝑛𝑛 𝑛 4 4 4
= 361+506.25+462.25 – 1323 = 6.5
Degrees of freedom = (r-1) = (3-1) = 2
6.5
MSR = 𝑆𝑆𝑅 = 6.5 = = 3.25
(𝑟−1) (3−1) 2

SS residual or error: SSE = SST – SSC – SSR = 15 – 5.66 – 6.5 = 2.84


2.84
𝑆𝑆𝐸
MSE = (𝑟−1)(𝑐−1) = = 0.47
6
𝑀𝑆𝐶
Since MSC > MSE we take 𝐹 = and MSR > MSE we take 𝐹 = 𝑀𝑆𝑅
𝑐 𝑀𝑆𝐸 𝑟 𝑀𝑆𝐸

𝑀𝑆𝐶 1.88
𝐹𝑐 = = = 4.0
𝑀𝑆𝐸 0.47
𝑀𝑆𝑅 3.25
𝐹𝑟 = = = 6.91
𝑀𝑆𝐸 0.47
For Consignment:
The calculated value of 𝐹𝑐 is 4. The table value of F for (3,6) df at 5% level of significance is
4.76. Since the calculated value of F is less than the table value, we accept the null hypothesis
and conclude that it is not significant.
For Student:
The calculated value of 𝐹𝑟 is 6.91. The table value of F for (2,6) df at 5% level of significance
is 5.14. Since the calculated value of F is greater than the table value, we accept the null
hypothesis and conclude that it is significant.

UNIT 12
1. Table 12.11 depicts the marks obtained by 10 students in commerce and statistics.
Calculate the rank correlation.

Table 12.11: Marks of Students Obtained in Commerce and Statistics

Marks in Statistics 35 90 70 40 95 45 60 85 80 50
Marks in Commerce 45 70 65 30 90 40 50 75 85 60
Ans.1
Marks in Rank 1 Marks in Rank 2 D = R1 - R2 D2
Statistics Commerce
35 10 45 8 2 4
90 2 70 4 -2 4
70 5 65 5 0 0
40 9 30 10 -1 1
95 1 90 1 0 0
45 8 40 9 -1 1
60 6 50 7 -1 1
85 3 75 3 0 0
80 4 85 2 2 4
50 7 60 6 1 1
∑D = 0 ∑ D2 = 16

6ΣD2 6(16) 96
r= 1− =1− = = 1 − 0.0969 = 0.9031
N(N2 − 1) 10(102 − 1) 990
2. Calculate Spearman’s rank correlation coefficient between the series A and B depicted
in table 12.12.
Table 12.12: Series Data of Terminal Question 2
Series 1 57 59 62 63 64 65 55 58 57
Series 2 113 117 126 126 130 129 111 116 112
Ans.2
Series 1 Rank 1 Series 2 Rank 2 D = R1 - R2 D2
57 7.5 113 7 .5 0.25
59 5 117 5 0 0
62 4 126 3.5 .5 0.25
63 3 126 3.5 -.5 0.25
64 2 130 1 1 1
65 1 129 2 -1 1
55 9 111 9 0 0
58 6 116 6 0 0
57 7.5 112 8 -.5 0.25
∑D = 0 ∑ D2 = 1
Here number 57 is repeated twice in series 1 and number 126 is repeated twice in series 2.
Therefore, in 1, m = 2 and in 2, m = 2.
1 1
6 (∑𝐷2 + (𝑚3 − 𝑚 ) + (𝑚3 − 𝑚 ))
12 1 1 12 2 2
𝑅 =1−
𝑁3 − 𝑁
1 1
6 (1 + (8 − 2) + (8 − 2))
12 12 6(2)
𝑅 =1− =1− = 1 − 0.016 = 0.984
93 − 9 729

3. For the data in table 12.13, obtain the two lines of regression and its estimation of the
blood pressure when age is 50 yrs.
Age in years(X) 56 42 72 39 63 47 52 49 40 42 68 60
BP (Y) 127 112 140 118 129 116 130 125 115 120 135 133
Ans.3

4. Table 12.14 depicts the results that were worked out from scores in statistics and
mathematics in a certain examination.
Table 12.14: Results of Scores in Statistics and Mathematics Examination

Scores in Statistics Scores in Mathematics


(X) (Y)
Mean 39.5 47.5
Standard Deviation 10.8 17.8
Karl Pearson’s correlation coefficient between X and Y = 0.42. Find both the regression
lines. Use these lines to estimate the value of Y when X = 50 and the value of X when Y =
30.
Ans.4 Given 𝑿 ̅ = 39.5, 𝒀 ̅ = 47.5
𝜎x = 10.8, 𝜎y = 17.8
r(x,y) = 0.42
∴ Regression coefficient of Y on X
𝜎𝑦 17.8
byx = r. = 0.42. = 0.692
𝜎X 10.8

∴ Regression line of Y on X is
Y - ̅Y = byx (X -̅X )
Y – 47.5 = 0.692(X – 39.5)
Y = 0.692X + 20.17
When X=50
Y = 54.67

∴ Regression coefficient of X on Y
bxy = r. 𝜎𝑥 = 0.42. 10.8 = 0.25
𝜎y 17.8

∴ Regression line of Y on X is
X - ̅X = byx (Y -̅Y )
X – 39.5 = 0.25(Y – 47.5)
X = 0.25Y + 27.39
When Y=30
X = 34.89

UNIT 13
1. What is business forecasting?
Ans.1 Business forecasting refers to the analysis of past and present economic conditions with
the object of drawing inferences about probable future business conditions. The process of
making definite estimates of future course of events is referred to as forecasting and the figure
or statements obtained from the process is known as ‘forecast;’ future course of events is
rarely known. To be assured of the coming course of events, an organised system of forecasting
helps. The following are two aspects of scientific business forecasting:

1. Analysis of past economic conditions


For this purpose, the components of time series are to be studied. The secular trend shows how
the series has been moving in the past and what its future course is likely to be over a long
period of time. The cyclic fluctuations would reveal whether the business activity is subjected
to a boom or depression. The seasonal fluctuations would indicate the seasonal changes in the
business activity.

2. Analysis of present economic conditions


The object of analysing present economic conditions is to study those factors which affect the
sequential changes expected on the basis of the past conditions. Such factors are new
inventions, changes in fashion, changes in economic and political spheres, economic and
monetary policies of the government, war, etc. These factors may affect and alter the duration
of trade cycle. Therefore, it is essential to keep in mind the present economic conditions since
they have an important bearing on the probable future tendency.

2. Explain the objectives of business forecasting.


Ans.2 Business forecasting involves predicting future trends and outcomes in the business
environment. The primary objectives of business forecasting are to provide valuable insights
and help organizations make informed decisions. Here are the key objectives of business
forecasting:

1. Strategic Planning:
Forecasting helps in long-term strategic planning by providing insights into potential
opportunities and threats. It allows businesses to align their strategies with anticipated market
conditions.

2. Resource Allocation:
By forecasting demand for products or services, businesses can allocate resources such as
manpower, raw materials, and capital more efficiently. This prevents shortages or excesses,
optimizing operational efficiency.

3. Financial Planning:
Forecasting assists in financial planning by predicting future sales, revenues, and expenses.
This information is crucial for budgeting, setting financial goals, and ensuring the availability
of funds when needed.

4. Risk Management:
Businesses face various risks, including economic fluctuations, market changes, and
unexpected events. Forecasting helps identify potential risks, allowing organizations to
develop risk mitigation strategies and contingency plans.

5. Inventory Management:
Forecasting demand helps in managing inventory levels effectively. By avoiding overstocking
or stockouts, businesses can reduce holding costs, improve cash flow, and enhance customer
satisfaction.

6. Market Research and Competitive Analysis:


Forecasting is a valuable tool for conducting market research and analyzing the competitive
landscape. It helps businesses understand customer preferences, market trends, and the actions
of competitors.

7. Product Development and Innovation:


Anticipating future market demands enables businesses to align their product development and
innovation strategies with changing consumer needs. This proactive approach enhances a
company's competitive edge.

8. Human Resource Planning:


Forecasting assists in predicting workforce requirements based on future business needs. This
helps in recruitment, training, and talent management, ensuring that the organization has the
right people in the right positions.

9. Sales and Marketing Strategy:


Businesses use forecasting to set realistic sales targets and develop effective marketing
strategies. It enables organizations to tailor their promotional efforts to match expected market
conditions.

10. Performance Evaluation:


Comparing actual outcomes with forecasted values helps businesses evaluate their
performance. It provides valuable feedback for adjusting strategies and improving the
accuracy of future forecasts.

11. Stakeholder Communication:


Forecasting results are often communicated to various stakeholders, including investors,
employees, and suppliers. Clear communication helps build trust and transparency in the
business operations.

3. Explain the steps involved in forecasting.


Ans.3 Forecasting of business fluctuations consists of the following steps:
1. Understanding why changes in the past have occurred
One of the basic principles of statistical forecasting is that the forecaster should use past
performance data. The current rate and changes in the rate constitute the basis of forecasting.
Once they are known, various mathematical techniques can develop projections from them. If
an attempt is made to forecast business fluctuations without understanding why past
changes have taken place; the forecast will be purely mechanical. Business fluctuations are
based solely upon the application of mathematical formulae and are subject to serious error.
2. Determining which phases of business activity must be measured
After understanding the reasons of occurrence of business fluctuations, it is necessary to
measure certain phases of business activity to predict what changes will probably follow the
present level of activity.
3. Selecting and compiling data to be used as measuring devices
There is an independent relationship between the selection of statistical data and determination
of why business fluctuations occur. Statistical data cannot be collected and analysed in an
intelligent manner unless there is sufficient understanding of business fluctuations. It is
important that reasons for business fluctuations be stated in such a manner that it is possible
to secure data that is related to the reasons.
4. Analysing the data
Lastly, the data is analysed to understanding the reason why change occurs. For example, if it
is reasoned that a certain combination of forces will result in a given change, the statistical
part of the problem is to measure these forces, from the data available, to draw conclusions on
the future course of action. The methods of drawing conclusions may be called forecasting
techniques.

4. Explain the characteristics of business forecasting.


Ans.4
1. Informed Decision-Making: The primary purpose of business forecasting is to support
informed decision-making. It provides decision-makers with insights into potential future
scenarios, helping them plan and allocate resources effectively.

2. Based on Historical Data: Business forecasting relies on historical data and trends to
identify patterns and make predictions about future events. Analysing past performance
provides a foundation for understanding and anticipating future behaviour.

3. Quantitative and Qualitative: Forecasting utilizes both quantitative and qualitative data.
Quantitative methods involve numerical data and statistical analysis, while qualitative
methods consider non-numeric factors such as market trends, customer preferences, and expert
opinions.

4. Time Frame: Forecasting is inherently future-oriented and involves predicting outcomes


over a specific time frame. The time horizon can range from short-term forecasts, such as
monthly sales projections, to long-term forecasts, covering several years or more.

5. Dynamic Nature: Business forecasting acknowledges the dynamic and ever-changing


nature of the business environment. External factors, such as economic conditions,
technological advancements, and market trends, can impact forecasts, requiring continuous
monitoring and adjustment.

6. Subject to Uncertainty: Despite using historical data and advanced modeling techniques,
forecasting is inherently uncertain. Various unforeseen events, such as natural disasters,
geopolitical changes, or unexpected market shifts, can influence outcomes.

7. Multiple Methods: There are various methods and techniques for business forecasting,
each suited to different situations and data types. Common methods include time series
analysis, regression analysis, market research, and expert judgment. A combination of
methods may be used for more accurate predictions.

8. Iterative Process: Business forecasting is often an iterative process, involving regular


review and refinement. As new information becomes available or circumstances change,
forecasts may need to be adjusted to reflect the evolving business landscape.
9. Risk Management: Forecasting is closely linked to risk management. By identifying
potential future scenarios and uncertainties, businesses can develop strategies to mitigate risks
and capitalize on opportunities.

10. Communication Tool: Forecasts serve as a communication tool within the organization.
They are shared with stakeholders, including executives, managers, investors, and employees,
to provide a common understanding of future expectations and goals.

11. Continuous Monitoring and Evaluation: Successful forecasting requires continuous


monitoring and evaluation of actual outcomes compared to predicted values. This feedback
loop helps refine forecasting models and improve their accuracy over time.

12. Flexibility and Adaptability: Business forecasting must be flexible and adaptable to
changing circumstances. Organizations should be prepared to adjust their forecasts and
strategies as new information emerges or as market conditions evolve.

5. Differentiate between prediction, projection and forecasting.


Ans.5
Criteria Prediction Projection Forecasting
A prediction is an
A projection is a A forecast is an estimate,
estimate based solely on
prediction, where the which relates the series
past data of the series
Definition extrapolated values are in which we are
under investigation. It is
subject to certain interested into external
purely a statistical
numerical assumptions. factors.
extrapolation.
Integrates various data
Relies on historical data Uses current data and
Data Usage sources, models, and
and models assumptions
insights
Comprehensive
Estimations based on
Specificity Specific, concrete. understanding of future
current trends
developments
Adaptable and considers
Flexibility Limited flexibility Somewhat flexible
a range of scenarios

6. Describe the limitations of business forecasting.


Ans.6 Business forecasting cannot be accurate due to various limitations which are mentioned
below.
• Forecasting cannot be accurate, because it is largely based on future events and there is
no guarantee that they will happen.
• Business forecasting is generally made by using statistical and mathematical methods.
However, these methods cannot claim to make an uncertain future a definite one.
• The underlying assumptions of business forecasting cannot be satisfied simultaneously.
In such a case, the results of forecasting will be misleading.
• The forecasting cannot guarantee the elimination of errors and mistakes. The managerial
decision will be wrong if the forecasting is done in a wrong way.
• Factors responsible for economic changes are often difficult to discover and measure.
Hence, business forecasting becomes an unnecessary exercise.
• Business forecasting does not evaluate risks.
• The forecasting is made based on past information and data and relies on the assumption
that economic events are repeated under the same conditions. But there may be
circumstances where these are not repeated.
• Forecasting is not a continuous process. To be effective, it requires continuous attention.

7. Explain the main methods of business forecasting.


Ans.7
1. Business Barometers: Business barometers are indicators or metrics that serve as early
signs of changes in economic conditions. Examples include leading economic indicators, such
as stock market indices, consumer confidence indices, or manufacturing output, which are
monitored to predict broader economic trends.

2. Time Series Analysis:Time series analysis involves studying historical data to identify
patterns and trends over time. It includes methods like moving averages, autoregressive
integrated moving average (ARIMA) models, and seasonal decomposition. This method is
particularly useful for forecasting based on past observations.

3. Extrapolation: Extrapolation is a simple forecasting method that involves extending past


trends into the future without considering additional factors. While it's straightforward, it
assumes that historical patterns will continue, which may not always hold true in dynamic
business environments.

4. Regression Analysis: Regression analysis explores the relationship between dependent and
independent variables. It is a statistical method used to model and predict the impact of one or
more factors on a target variable. Regression models can be simple linear or multiple,
depending on the number of predictors.

5. Modern Econometric Methods: Modern econometric methods use advanced statistical


techniques to model and analyze economic relationships. These methods often include
simultaneous equations models, panel data analysis, and vector autoregression (VAR) models.
They provide a more sophisticated approach to capturing complex interactions in economic
data.

6. Exponential Smoothing Method: Exponential smoothing is a time series forecasting


method that assigns exponentially decreasing weights to past observations. It is particularly
effective for data with trends and seasonality. The weighted averages give more importance to
recent observations, making it responsive to changes.

Each of these methods has its strengths and weaknesses, and the choice of method depends on
factors such as the nature of the data, the forecasting horizon, and the specific requirements of
the business or industry. Often, a combination of methods is used to provide a more robust and
accurate forecast.

8. Critically examine the important theories of business forecasting.


Ans.8 1. Sequence or time-lag theory: Sequence or time-lag theory suggests that economic
events follow a specific sequence or time pattern. Changes in certain economic indicators,
such as production or employment, are believed to occur in a sequence with a time lag. This
theory helps in understanding the timing and duration of economic cycles.

2. Action and reaction theory: Action and reaction theory posits that economic events are
interconnected, with one event triggering a series of reactions. For instance, a government
policy change or a shift in consumer behavior may lead to a chain reaction of events in the
economy. This theory emphasizes the importance of identifying causal relationships between
economic variables.

3. Economic rhythm theory: Economic rhythm theory suggests that economic activities
exhibit rhythmic patterns or cycles. These cycles, often characterized as boom and bust phases,
are thought to follow a regular and predictable rhythm. Understanding these economic rhythms
assists in forecasting future trends and adjusting business strategies accordingly.

4. Specific historical analogy: The specific historical analogy theory involves drawing
comparisons between current economic conditions and past historical events. By identifying
similarities between the present situation and a specific historical period, forecasters can make
predictions based on the outcomes of similar circumstances in the past.

5. Cross-cut analysis theory: Cross-cut analysis theory involves examining various economic
indicators and factors simultaneously to make forecasts. Instead of focusing on one variable,
this theory emphasizes the importance of considering multiple factors that may impact each
other. The interconnectedness of different elements is crucial for a more comprehensive and
accurate forecast.

UNIT 14
1. What is meant by analysis of time series?
Ans.1 Time series analysis is a statistical method used to analyze and interpret data points
collected over time. In this analysis, data is ordered chronologically, allowing for the
examination of patterns, trends, and variations within the dataset. The primary goal of time
series analysis is to uncover meaningful insights, make predictions, or model the underlying
structure of the time-dependent data.
2. State the difference between seasonal variations and cyclical fluctuations.
Ans.2
Characteristic Seasonal Variations Cyclical Fluctuations
Nature Regular and predictable patterns Longer-term, repetitive patterns
Repeats at fixed intervals (days, weeks, No fixed interval, variable
Frequency months, seasons) frequency
Duration Short-term, limited duration Longer-term, extended duration
Influenced by external factors like Influenced by broader economic
Cause weather, holidays, or cultural events. factors and business cycles.

3. What is trend? State various methods of measuring it.


Ans.3In time series analysis, a trend refers to a long-term pattern or direction in the data,
reflecting a systematic change over time. Trends can be upward (indicating growth or
expansion), downward (indicating decline or contraction), or flat (indicating stability).

Methods of Measuring Trend:


i. Free Hand or Graphic Methods: This method involves visually inspecting the time series
data and drawing a line that best represents the overall direction of the data points. While
simple, it is subjective and depends on the analyst's judgment.

ii. Semi Averages Method: The semi averages method involves computing averages for
overlapping subsets of data points. This helps smooth out short-term fluctuations, making it
easier to identify the underlying trend. It is a less subjective method compared to free hand or
graphic methods.

iii. Moving Average Method: Moving averages involve calculating the average of a set
number of adjacent data points at each time period. This method helps filter out short-term
fluctuations, making the underlying trend more apparent. Common types include simple
moving averages (SMA) and weighted moving averages (WMA).

iv. Method of Least Squares: The method of least squares is a statistical approach that
minimizes the sum of the squared differences between the observed and predicted values of
the dependent variable. When applied to time series analysis, it helps identify the line that best
fits the overall trend in the data. Linear regression is an example of the method of least squares.

4. Explain the moving average method of measuring long term trend.


Ans.4 The moving average method is a commonly used technique for measuring the long-
term trend in time series data. It involves calculating the average of a set number of adjacent
data points at each time period, which helps smooth out short-term fluctuations and highlights
the underlying trend. The moving average is particularly useful for identifying patterns in data
and reducing noise, making it easier to interpret trends over time.
5. What are the components of time series? Bring out the significance of moving average
in analysing a time series and point out its limitations.
Ans.5 Time series data can often be decomposed into four main components:

Long-term trend or secular trend: This component represents the underlying trend in the
data over an extended period. It indicates the overall direction in which the time series is
moving, ignoring short-term fluctuations.

Seasonal variations: Seasonal variations occur when a time series exhibits regular patterns or
cycles at specific intervals, such as daily, weekly, monthly, or yearly. These patterns are often
influenced by external factors like weather, holidays, or cultural events.

Cyclic variations: Cyclic variations are longer-term patterns that do not have a fixed duration.
Unlike seasonal variations, cyclic patterns are not as regular and might span multiple years.
Economic cycles, for example, can be considered cyclic variations.

Random variations (or residuals): Random variations, also known as residuals, represent
the irregular or unpredictable fluctuations in the time series that cannot be attributed to the
long-term trend, seasonal patterns, or cyclic variations. These variations are often caused by
random events and noise in the data.

Significance of Moving Average in Time Series Analysis:

Smoothing Trends: Moving averages are effective in smoothing out short-term fluctuations
or noise in time series data. They provide a clearer view of the underlying trends by averaging
out random variations, making it easier to identify the long-term movement.
Highlighting Patterns: Moving averages help in highlighting patterns and trends in the data,
making it easier for analysts to observe and interpret the direction in which the time series is
moving. This is especially useful for detecting cycles and identifying potential turning points.
Forecasting: Moving averages are used for forecasting future values in a time series. By
calculating moving averages over specific intervals, analysts can make predictions about the
next data point. This is particularly useful in identifying trends and making short-term
predictions.
Noise Reduction: They are effective in reducing the impact of random fluctuations or outliers,
providing a clearer picture of the overall behavior of the time series. This makes it easier to
discern meaningful patterns without being overly influenced by short-term irregularities.

Limitations of Moving Average in Time Series Analysis:

Lagging Indicator: Moving averages introduce a lag in the data because they are based on
past observations. This lag may cause delayed reactions to changes in the underlying patterns,
making it less effective for real-time analysis or forecasting.
Sensitivity to Window Size: The choice of the window size (the number of data points
included in the average) can significantly impact the results. Smaller windows provide more
responsiveness to recent changes but may be sensitive to noise, while larger windows offer
smoother trends but might overlook short-term variations.
Not Suitable for Irregular Data: Moving averages may not perform well with irregular time
series data that do not exhibit consistent patterns. In such cases, the averaging process may
distort the true nature of the data.
Inability to Capture Sudden Changes: Sudden and unexpected changes in the time series,
such as abrupt shifts or outliers, can be challenging for moving averages to capture. They may
take time to adjust to significant changes in the data.
Assumption of Stationarity: Moving averages assume stationarity, meaning that the
statistical properties of the time series remain constant over time. If the time series exhibits
non-stationary behaviour, the moving average may not provide accurate insights.

6. What is meant by secular trend? Discuss any two methods of isolating trend values in
a time series.
Ans.6 Secular Trend:
A secular trend, also known as a long-term trend, refers to the underlying, persistent movement
or direction in a time series over an extended period. It represents the gradual, sustained
increase or decrease in the values of a variable. Secular trends are usually observed over years
or decades and can be influenced by factors such as economic growth, technological
advancements, population changes, or other fundamental shifts in the environment.

Methods of Isolating Trend Values in a Time Series:

Moving Averages: Moving averages are a common method used to isolate the trend
component in a time series. This technique involves calculating the average of a specified
number of adjacent data points, effectively smoothing out short-term fluctuations and
highlighting the underlying trend. The choice of the window size is crucial, as smaller
windows provide more responsiveness to recent changes but may be sensitive to noise, while
larger windows offer smoother trends but might overlook short-term variations.

Exponential Smoothing: Exponential smoothing is another method used to isolate trend


values by assigning different weights to different observations. It gives more emphasis to
recent data points while gradually reducing the weight of older observations. The formula for
exponential smoothing involves updating the trend estimate based on the current observation
and the previous trend estimate. This method is particularly useful when there is a need to
respond quickly to changes in the underlying trend, and it is widely employed in forecasting
applications.

7. What is seasonal variation of a time series? Describe the various methods you know to
evaluate it and examine their relative merits.
Ans.7 Seasonal variation in a time series refers to the predictable and recurring patterns or
fluctuations that occur at regular intervals within a specific time period. These patterns often
correspond to calendar months, quarters, or other regular intervals. Understanding and
analysing seasonal variation is crucial in time series analysis, as it allows for better forecasting
and trend identification.
The main methods of measuring seasonal variations are:
1. Simple Average Method: The simple average method is a straightforward approach to
evaluating time series data. It involves calculating the average value of a variable over a
specified time period. The merit of the simple average method lies in its simplicity and ease
of application. It provides a quick overview of the central tendency of the data, making it
accessible for initial assessments. However, its limitation is that it might not capture variations
or trends effectively, especially if the data has significant fluctuations.

2. Ratio to Moving Averages Method: The ratio to moving averages method involves
dividing the value of a variable at a specific time point by the moving average of that variable
over a certain period. This method helps to smooth out short-term fluctuations and highlight
underlying trends. Its merit lies in its ability to provide a clearer picture of the relative
performance of the variable, reducing noise and emphasizing the more persistent patterns.
However, the choice of the moving average window size is crucial, and this method may lag
behind sudden changes in the data.

3. Chain or Link Relative Method: The chain or link relative method involves comparing
the current period's value with the previous period's value. This method is particularly useful
for identifying the rate of growth or decline between consecutive time points. Its merit lies in
its ability to capture the sequential relationship and highlight the direction and magnitude of
changes over time. It is especially effective when analysing data with varying growth rates.
However, it may be sensitive to outliers, and the interpretation of results depends on the base
period chosen.

4. Ratio to Trend Method: The ratio to trend method involves dividing the actual value of a
variable by its trend value at a specific time. This method is beneficial for isolating and
examining the cyclical or trend component of a time series. Its merit lies in its ability to
highlight deviations from the overall trend, providing insights into cyclical patterns. It is
particularly useful when there is a need to distinguish between short-term fluctuations and
long-term trends. However, accurate estimation of the trend component is crucial, and the
method may be sensitive to extreme values in the data.

8. Find a straight-line trend to the following data and find trend value.

Table 14.10: Yearly Production Data


Year Production in 1000(kg)
1990 80
1991 90
1992 92
1993 83
1994 94
1995 99
1996 92

Ans.8 The trend line can be fitted by using the method of least squares for the given data.

Year Production (Y) X = Year - 1993 XY X2


1990 80 -3 -240 9
1991 90 -2 -180 4
1992 92 -1 -92 1
1993 83 0 0 0
1994 94 1 94 1
1995 99 2 198 4
1996 92 3 276 9
Total ∑Y = 630 ∑X = 0 ∑XY = 56 ∑X2 = 28

When ∑X = 0, then normal equations reduce to:


∑Y 630
ΣY = Na, thⅇrⅇforⅇ, a = = = 90
N ∑XY 7 56
ΣXY = bΣX2, thⅇrⅇforⅇ, b = = =2
∑X2 28

The estimated equation of the trend line is given by : Y = a + bX = 90 + 2X


If X = -3,
Y = 90 + 2(-3) = 84
If X = -2,
Y = 90 + 2(-2) = 86
If X = -1,
Y = 90 + 2(-1) = 88
If X = 0,
Y = 90 + 2(0) = 90
If X = 3,
Y = 90 + 2(3) = 96
If X = 2,
Y = 90 + 2(2) = 94
If X = 1,
Y = 90 + 2(1) = 92.

9. Find seasonal values for the data in table 14.11.


Table 14.11: Data of Terminal Question 9
Year I Quarter II Quarter III Quarter IV Quarter
1995 3.7 4.1 3.3 3.5
1996 3.7 3.9 3.6 3.6
1997 4.0 4.1 3.3 3.1
1998 3.3 4.4 4.0 4.0
Ans.9

Year I Quarter II Quarter III Quarter IV Quarter


1995 3.7 4.1 3.3 3.5
1996 3.7 3.9 3.6 3.6
1997 4.0 4.1 3.3 3.1
1998 3.3 4.4 4.0 4.0
Quarterly 14.7 16.5 14.2 14.2
Total
Quarterly 3.675 4.125 3.55 3.55
Grand Average
3⋅675+4⋅125+3.55+3.55 14.9
Average = = =3.725
4 4

𝐴̅𝑣𝑒𝑟𝑎𝑔𝑒 𝑜𝑓 𝐼 𝑄𝑢𝑎𝑟𝑡𝑒𝑟 3.675


S.I for I quarter = × 100 = × 100 = 98.66
𝐺𝑟𝑎𝑛𝑑 𝐴̅𝑣𝑒𝑟𝑎𝑔𝑒 3.725

𝐴̅𝑣𝑒𝑟𝑎𝑔𝑒 𝑜𝑓 𝐼𝐼 𝑄𝑢𝑎𝑟𝑡𝑒𝑟 4.125


S.I for II quarter = × 100 = × 100 = 110.74
𝐺𝑟𝑎𝑛𝑑 𝐴̅𝑣𝑒𝑟𝑎𝑔𝑒 3.725

𝐴̅𝑣𝑒𝑟𝑎𝑔𝑒 𝑜𝑓 𝐼𝐼𝐼 𝑄𝑢𝑎𝑟𝑡𝑒𝑟 3.55


S.I for III quarter = × 100 = × 100 = 95.30
𝐺𝑟𝑎𝑛𝑑 𝐴̅𝑣𝑒𝑟𝑎𝑔𝑒 3.725

𝐴̅𝑣𝑒𝑟𝑎𝑔𝑒 𝑜𝑓 𝐼𝑉 𝑄𝑢𝑎𝑟𝑡𝑒𝑟 3.55


S.I for IV quarter = × 100 = × 100 = 95.30
𝐺𝑟𝑎𝑛𝑑 𝐴̅𝑣𝑒𝑟𝑎𝑔𝑒 3.725

The seasonal values obtained are 98.66, 110.74, 95.30, 95.30.


UNIT 15
1. What is index number? State its utility.
Ans.1 An index number is a number which is used to measure the level of a certain
phenomenon as compared to the level of the same phenomenon at some standard period. In
other words, an index number is a number which is used as a device for comparison between
the price, quantity, or value of a group of articles in different situations for example, at a certain
place or a period and that of another place or period of time.
Utility: The primary purpose of index numbers is to measure relative temporal or cross-
sectional changes in a variable or a group of related variables which are not capable of being
directly measured. The greatest purpose of index numbers has been to measure and compare
the changes in prices and purchasing power of money which have received great attention
from economists for many years.
2. Discuss the problems of:
a. Selection of the base year
Ans.a Choosing the appropriate base year is a critical aspect in constructing index numbers
for measuring changes in economic variables. The base year serves as a reference point against
which subsequent values are compared. One challenge is that economic conditions can vary
over time, and selecting a base year that no longer represents the current economic structure
may lead to distorted results. If the base year is too distant, it might not accurately reflect the
current consumption patterns or market dynamics, potentially introducing bias into the index.
To address this problem, periodic updates to the base year may be necessary to ensure the
relevance of the index and its accuracy in reflecting actual changes in the economy.

b. Selection of weights in the construction of index numbers


Ans.b The selection of weights in constructing index numbers is crucial for accurately
representing the relative importance of different components in a composite index. The
problem arises when outdated or inaccurate weights are used, leading to a misrepresentation
of the actual significance of various elements. If the weights do not reflect current consumption
patterns or market shares, the resulting index may fail to capture the true impact of changes in
individual components on the overall index. Regularly updating weights based on recent data
is essential to ensure the index remains reflective of the current economic structure.
Additionally, the challenge lies in obtaining accurate and up-to-date information for
determining these weights, as data collection and processing can be resource-intensive and
subject to delays, potentially impacting the precision of the index. Regular reviews and
adjustments to weights are necessary to enhance the reliability and relevance of index numbers
in measuring economic changes accurately.

3. What are the characteristics of an index number?


Ans.3 1. Expressed in numbers: Index numbers represent the relative changes such as
increase in production; reduction in prices etc. in the numbers.

2. Expressed in percentage: Index numbers are expressed in terms of percentages so as to


show the extent or relative change where the value of base is assumed to be 100 but the sign
of percentage (%) is not used.

3. Relative measure: Index number measure changes which are not capable of direct
measurement.

4. Specified averages: Index number represents a special case of average, in general known
as weighted average. It is a special type of average, because in a simple average, the data is
homogenous having the same unit of measurement, whereas the average variables have
different units of measurement.

5. Basis of comparison: Index numbers by their very nature are comparative. They compare
changes over time or between places or similar categories.
4. Construct Fisher’s ideal index for the data depicted in table 15.12

Commodity Base year 1997 Current year 2005


Price Qty Price Qty
A 16 110 25 132
B 5 220 5 264
C 10 132 15 165
D 25 66 30 55
Ans.4
Commodity Base year 1997 Current year 2005
Price(p0) Qty(q0) Price(p1) Qty(q1) p0q0 p0q1 p1q0 p1q1
A 16 110 25 132 1760 2112 2750 3300
B 5 220 5 264 1100 1320 1100 1320
C 10 132 15 165 1320 1650 1980 2475
D 25 66 30 55 1650 1375 1980 1650
∑p0q0 ∑p0q1 ∑p1q0 ∑p1q1
=5830 =6457 =7810 =8475

Fisher's Method:
p01 = √
Σp1q0
×
Σp1q1
× 100 = √
7810
×
8475
× 100 = √
66189750 × 100 = 133.66
Σp0q0 ∑p0q1 5830 6457 37644310

5.The table 15.13 depicts the price of commodities along with the weights of respective
commodities. Calculate index number for 2000 based on the year 1995.
Table 15.13: Price of Commodities along with the Weights
Commodity 1995 2000 Weights
A 0.50 0.75 2
B 0.60 0.75 5
C 2.00 2.40 4
D 1.80 2.10 8
E 8.00 10.00 1
Ans.5
Index Number for 2000 = 𝑇𝑜𝑡𝑎𝑙 𝑒𝑥𝑝𝑒𝑛𝑠𝑒 𝑖𝑛 2000 × 100
𝑇𝑜𝑡𝑎𝑙 𝐸𝑥𝑝𝑒𝑛𝑠𝑒 𝑖𝑛 1995

Commodity 1995(p0) 2000(p1) Weights (w) p0w p1w


A 0.50 0.75 2 1 1.5
B 0.60 0.75 5 3 3.75
C 2.00 2.40 4 8 9.6
D 1.80 2.10 8 14.4 16.8
E 8.00 10.00 1 8 10
∑p0w = 34.4 ∑p1w = 41.7
Index Number for 2000 = 41.7 × 100 = 𝟏𝟐𝟏. 𝟐𝟐
34.4
The required index number for the year 2000 is 123.3

SELF ASSESSMENT QUESTIONS


UNIT 1
1. In which of the following situations would you like to use Statistics?
a) Buying a house
b) Purchasing medicine prescribed by a doctor
c) Investing funds in several options
d) Attending relatives marriages
2. Out of the following, which one does not refer to a mass of data?
a) Banking Statistics
b) Mathematical Statistics
c) Agricultural Statistics
d) Income Statistics
3. Which of the following statement is most appropriate?
a) Nature believed in statistics
b) Nature created statistics
c) Nature believed in variation
d) Nature believed in symmetrical variation
4. Which of the following statement is true?
a) Statistics enlarges physical vision
b) Statistics helps in estimation
c) Statistics quantifies uncertainty
d) Statistics is of no use to humanity.
5. The origin of statistics can be traced to
a) State
b) Commerce
c) Economics
d) Industry
6. According to the definition of Statistics given by Croxton and Cowden, what are the four
components of Statistics?
Ans.6 The four components of Statistics are collection, presentation, analysis and
interpretation of data
7. ‘Statistics may be called the science of counting’ is the definition given by
a) Croxton
b) A.L.Bowley
c) Boddington
d) Webster
8. In the olden days statistics was confined only to State Affair.
9. Mention some other areas where there is a scope of applying statistics.
Ans.9 Industrial Quality control, Investment policies, to find market potential for a product.
10. Answer the following:
a) Should the same degree of accuracy be applied while measuring the height of a mountain
and the height of a person? - NO
b) Does Statistics deal with qualitative data? - NO
11. Categorise the following data as qualitative or quantitative data
a) The number of transactions occurring in an ATM per day
Ans. a) Quantitative data
b) The popular brand name in cars is Maruthi
Ans. b) Qualitative data

12. The total sale of a product in Area A is 840 for 30 working days. The total sale of the same
product in Area B is 784 for 28 working days. Should Statistics be applied to get an appropriate
picture regarding the comparison of sales?
Ans.12 Yes

UNIT 2
1. What are the main stages in a survey? – Planning and execution
2. Training of investigators belongs to which stage? - Planning
3. Analysis of data is a part of the execution of survey. Is this correct? - Yes
4. Classify the following as finite or infinite population.
i) Production of a product in a factory for a day - Finite
ii) The set of rational numbers - Infinite
iii) The weight of newborn babies measured up to first decimal place in a state during the first
week of February 2008 - Finite
5. Classify the following as an attribute or a variable.
i) Eye colour of human beings - Attribute
ii) Number of pages in a book of various subjects - Variable
6. Classify the following as discrete or continuous variable
i) Number of shares sold each day in a stock market. - Discrete
ii) Temperatures recorded every half hour at a regional meteorological centre. - Continuous
7. Statistics can best be considered as
i) both Art and Science
ii) Art
iii) Science
iv) neither Art nor Science
8. Data that possess numerical properties are known as
i) Quantitative data
ii) Qualitative data
iii) Primary data
iv) Parametric data
9. A tool of all science in research and making an intelligent judgement is
i) Statistics
ii) Collection
iii) Data
iv) Judgement
10. State whether the following data are Primary or Secondary.
i) An official of the Census Board of India is preparing a report on census of population based
on the survey data that is collected by the Census Board. - Primary Data
ii) An HR representative of a software company is deciding on the time taken to perform a
particular job on a project based on random observations collected by him. - Primary Data
iii) A neurologist is examining the relationship between cigarette smoking and brain tumour
based on the data published in a famous neurology journal. – Secondary Data

11. When population under investigation is infinite, we should use


i) sample method
ii) census method
iii) neither census nor sample method
iv) both a & b
12. State True or False:
i) Census conducted by Government of India is an example of primary data. - TRUE
ii) TV News Bulletins gather information on any event through their agents. - TRUE
iii) Schedules make respondents record their answers. - FALSE
iv) A covering letter to the questionnaire brings confidence in respondents. - TRUE
v) Questions in questionnaire should be lengthy. - FALSE
13. State whether each of the following variables is qualitative or quantitative.
i) Age - Quantitative
ii) Gender - Qualitative
iii) Class Rank - Qualitative
iv) Number of people favouring the death penalty - Quantitative
14. State whether each of the following variables is qualitative or quantitative and indicates
the measurement scale that is appropriate for each.
i) Annual sales - Quantitative, Ratio
ii) Soft drink size (small, medium, large) - Qualitative, Nominal
iii) Employee classification (GSI through GSIS) - Qualitative, Ordinal
iv) Earning per share - Quantitative, Ratio
v) Methods of payments (cash, check, credit card) - Qualitative, Nominal

UNIT 3
1. Classification is a systematic grouping of the units according to their common
characteristics.
2. Classification reduces bulk of the data.
3. Classification of data that are non-measurable is known as Attributes.
4. Classification done according to two attributes or variables is Two-Way Classification.
5. Manifold classification involve more than two variables.
6. Data arranged according to time of occurrence is known as Chronological classification.
7. Geographical classification means classification of data according to:
i) Location
ii) Time
iii) Attributes
iv) Class intervals
8. Classification is a process of arranging the data into:
i) Different columns
ii) Different rows
iii) Different rows and columns
iv) Groups of related facts in different classes
9. The data that can be classified on the basis of time is:
i) Geographical
ii) Chronological
iii) Qualitative
iv) Quantitative
10. State True or False
i. Tabulation presents the data in a minimum space. - TRUE
ii. Tabulation is a process of analysis - FALSE
iii. General purpose table deals with specific objectives. - FALSE
iv. Derived tables deal with total, percentages, ratios, etc - TRUE
11. i) If the data readings are 3, 4, 5, 6, 7, then it is called discrete variable. Height is generally
continuous variable.
ii) There are five derived frequency distributions for any frequency distribution.
iii) Width of class-interval is given by the difference between upper class limit and lower
class limit.
iv) There are two marginal distributions for a distribution.
v) Sturge’s formula is used to calculate the number of class-intervals.
vi) The relative frequency distribution is obtained from frequency distribution by calculating
f/N.
12. i) Diagrams give an accurate value. (True/False)
ii) Pie diagram is drawn according to degree subtended at the centre of a circle. (True/False)
iii) Simple bar diagram is drawn for multiple characteristics. (True/False)
13. The graph plotted in the form of series of rectangles is
i) Frequency
ii) Frequency polygon
iii) Pie
iv) Histogram
14. The diagram which are used to show percentages break down is
i) A circle
ii) A square
iii) A pie diagram
iv) A rectangle
15. A line graph indicates
i) Comparison
ii) Variation
iii) Range
iv) All the above
16. Which of the following is not a type of bar chart?
i) Multiple
ii) Percentages
iii) Subdivided
iv) Ogive

UNIT 4
1. State whether the following questions are ‘True’ or ‘False’.
i. For a given set of values if we add a constant 5 to every value, then the arithmetic mean is
affected. - TRUE
ii. Arithmetic mean can be calculated for distribution with open-end classes. - FALSE
iii. Arithmetic mean is affected by extreme values. - TRUE
iv. Arithmetic mean of 12, 16, 23, 25, 28, 32 is 22. - FALSE
2. A single value within the range of the entire mass of data that is used to represent the whole
data is
i) Measures of Central tendency
ii) Statistics
iii) Measures of Dispersion
iv) Skewness
3. Find the Arithmetic mean 68,41,75,91,53,86,59
i) 67.57
ii) 47.57
iii) 37.57
iv) 27.57
4. The average computed by considering the relative importance of each of values to the total
value, is called
i) Arithmetic mean
ii) Geometric mean
iii) Weighted arithmetic mean
iv) Harmonic average.
5. State whether the following questions are ‘True’ or ‘False’.
i) Mode is based on all values - FALSE
ii) Mode = 3 Median – Mean - FALSE
iii) Geometric mean is used when we are interested in rate of growth of any phenomena -
TRUE
iv) Harmonic mean exists if one of the values is zero. - FALSE
v) A.M < G.M < H.M for any two values ‘a’ and ‘b’. - FALSE
vi) Arithmetic mean can be calculated accurately even when the distribution has open-end
class. - FALSE
vii) Mode can be located graphically. - TRUE
viii) Mode is used when data is on interval scale. - TRUE
6. If the values of the variables are arranged in ascending order of magnitude, the middle term
is
i) mean
ii) mode
iii) median
iv) quartile
7. In a symmetrical distribution the mean, median and mode
i) differ
ii) coincide
iii) mean-median = mode
iv) differ by 0.5
8. The relation between mean, median and mode is given by
i) Mode= 3 Median-2 Mean
ii) Mode=2 Mean-Median
iii) Mode= 3Median –Mean
iv) Mode= Mean- Median
9. The harmonic mean of 30 and 20 is
i) 25
ii) 24
iii) 20
iv) 30
10. If assumed mean A=32.5, i=8, fd =-13 and f= 90
i) mean = 35.31
ii) mean=31.35
iii) mean = 33.15
iv) mean=35.35
11. In any distribution when the original items differ in size, the value of Arithmetic mean
(AM), Geometric mean (GM) and Harmonic mean (HM) would also differ in the following
order
i) AM>GM>HM
ii) AM=GM=HM
iii) AM<HM<GM
iv) AM.GM>HM
12. State whether the following questions are ‘True’ or ‘False’.
i) Quartiles are positional value. - TRUE
ii) Quartiles help us to find percentage of readings below or above a certain value. - TRUE
iii) Q2 = P50 = D7 = Median - FALSE
13. State whether the following questions are ‘True’ or ‘False’.
i) The cost of living index numbers calculated are based on weighted averages. - TRUE
ii) Many of the items which we use in our life can be assigned weights. - TRUE
14. State whether the following questions are ‘True’ or ‘False’
i. Standard deviation is based on all the values. -TRUE
ii. Standard deviation of a set of values is increased if every value of the set is increased by a
constant. - FALSE
iii. Standard deviation can be calculated for distributions with open-end classes. - FALSE
iv. Coefficient of variation can be used to compare the variability of two sets of data measuring
the same characteristics. - TRUE

UNIT 5
1. To which approach does the following probability estimates belong:
i. Probability that India will win the game - Subjective approach
ii. Probability that Mr. Ram will resign from the post - Mathematical approach
iii. Probability of drawing a red card - Subjective approach
iv. Probability that you will go to America this year - Subjective approach
2. Find the probabilities in the following cases:
i. Getting an even number when a die is thrown – 1/2
ii. Getting 53 Mondays in ordinary year – 1/7
3. Given P(A) = 0.6, P(B) = 0.7, and P (A ∩ B) = 0.5. Find P (A U B)?
Ans.3 P(A𝖴 B) = P(A) +P(B) – P(A∩ B) = 0.6 + 0.7 – 0.5 = 0.8
4. State whether the following questions are true or false:
i. Bayes’ probability estimates sample value - FALSE
ii. Conditional probability can incorporate costs - FALSE
iii. Bayes’ probability gives up to date information - TRUE
5. Fill in the blanks:
i. For a random variable ∑ P(Xi) =1.
ii. Expectation of a random variable is same as mean of the probability distribution of that
variable.
iii. Var (X) = E (X2) – [E(X)]2.

UNIT 6
1. State whether the following statements are ‘True’ or ‘False’.
i) The sum of probabilities sometimes will be greater than 1. - FALSE
ii) The amount of time you study for an exam is a discrete random variable. - FALSE
iii) The Bernoulli distribution has only one parameter ‘p.’ - TRUE
2. State whether the following statements are ‘True’ or ‘False’.
i) Mean of binomial distribution is ‘npq.’ - FALSE
ii) ‘n’ and ‘p’ are the parameters of Binomial distribution. - FALSE
iii) If the mean and variance of a Binomial distribution are 6 and 5, then p = 1/6. - TRUE
iv) Each trial in a binomial experiment has the different probability of success ‘p’- FALSE
3. State whether the following statements are ‘True’ or ‘False’
i) ‘X’ is a Poisson variate if p < 0.1 and n > 10. - TRUE
ii) Poisson distribution is a unimodal distribution. - TRUE
4. State whether the following statements are ‘True’ or ‘False’.
i) Quartile deviation of normal distribution is 4/ 5 𝜎. - FALSE
ii) Mean and standard deviation of Standard normal distribution are ‘1’ and ‘0’. - FALSE
iii) Mean, Median and Mode coincide in a normal distribution - TRUE

UNIT 7
1. State whether the following statements are True or False.
i) Population is aggregate of objects under study. - TRUE
ii) Sampling method consume time and resources. - FALSE
iii) Population is a subset of sample. - FALSE
iv) An unbiased sample gives an accurate prediction of characteristics of an entire population.
- TRUE
v) The standard deviation of sampling distribution of a statistic is known as standard error of
that statistic. - TRUE
vi) Standard error is used as a reliability measure. - TRUE
vii) Faulty selection of sample contributes to sampling error. - TRUE
viii) Personal bias increases the non-sampling errors. - TRUE
ix) Unbiased errors are cumulative in nature. - FALSE
2. State whether the following statements are true ‘T’ or false ‘F’. - FALSE
i) Sample in which units are selected by judgment is known as probability sample. - FALSE
ii) Judgment sampling does not give representativeness of a sample. - TRUE
iii) Large sample size always results in minimising the standard error. - TRUE
iv) A sampling plan that divides the population into well-defined groups from which random
samples are drawn is known as cluster sampling. - FALSE
v) The principles of simple random sampling are the theoretical basis for statistical inference.
- TRUE
vi) If the mean of a certain population is 20, it is likely that most of the sample means will be
20. - FALSE
vii) Any sampling distribution can be totally described by its mean and standard deviation. -
FALSE
viii) The central limit theorem assures the sampling distribution of the mean approaches
normal distribution as the sample size increases. - TRUE
ix) Stratified sampling is used when each group considered are more homogenous within itself
and heterogeneous between group. - FALSE

UNIT 8

1. XY Pizza has developed quite a business in Bangalore by delivering pizza orders promptly.
It guarantees that its pizzas will be delivered in 30 minutes or less from the time the order was
placed, and if the delivery is late, the pizza is free. The time that it takes to deliver each pizza
order, that is, the on time is recorded in the pizza time book (PTB), and the delivery time for
those pizzas that are delivered late is recorded as 30 minutes in the PTB. A sample of 12
random entries from the PTB is depicted in table 8.5.
Table 8.5: Twelve Random Entries of Pizza Delivery Time
15.3 29.5 30 10.1 30 19.6 10.8 12.2 14.8 30 22.1 18.3
i) Find the mean for the sample.
ii) From what population was this sample drawn?
iii) Can this sample be used to estimate the average time that it takes for Pizza Hut to deliver
a pizza. Explain.
Ans.1 i) For the given sample the mean is 20.225 minutes.
ii) The population was drawn from the Pizza Time Book (PTB) of XY pizza.
iii) No. As the time over 30 minutes is recorded as 30 and hence, it will underestimate the
delivery time.
2. Madhu, a frugal student, wants to buy a used bike. After randomly selecting 125 wanted
advertisements, he found the average price of the bike to be Rs. 3250 with a standard deviation
of Rs. 615. Establish an interval estimate for the average price of bike so that Madhu can be:
i) 68.3% certain that the population mean lies in this interval.
ii) 95.5% certain that the population mean lies in this interval.
Ans.2 The population standard deviation is given as:
𝜎𝑠 = 615; 𝑛 = 125; 𝑋̅= 3250
and standard error is calculated as:
𝜎𝑆 615
𝜎𝑥 = = = 55.01
√𝑛 √125
̅ ± 𝟏𝝈𝒙̅= 3194.99± 𝟓𝟓. 𝟎𝟏 = 𝟑𝟏𝟗𝟒. 𝟗𝟗 and 3305.01 to be 68.3% certain.
i) 𝑿
ii) 95.5% certain means 𝑿 ̅ ± 𝟐𝝈𝒙̅= 3250 ± 𝟏𝟏𝟎. 𝟎𝟐 giving a range between 3139 and
3360.02
3. Given the following confidence levels, express the lower and upper limits of the confidence
interval for these levels in terms of X and 𝜎 x (Use the normal distribution tables).
i) 54 percent - 𝑿̅ ± 𝟎. 𝟕𝟒𝝈̅𝒙
ii) 75 percent - 𝑿 ̅ ± 𝟏. 𝟏𝟓𝝈𝒙̅
iii) 94 percent - 𝑿 ̅ ± 𝟏. 𝟖𝟖𝝈̅𝒙
iv) 98 percent - 𝑿 ̅ ± 𝟐. 𝟑𝟑𝝈𝒙̅
4. From a population of 540, a sample of 60 individuals is taken. From this sample the mean
is found to be 6.2 and the standard deviation to be 1.368.
i) Find the estimated standard error of the mean.
ii) Construct a 96 % confidence interval of the mean.
Ans.4 𝜎 i. 𝜎
= ×√
𝑁−𝑛
=
1.368
×√
540−60 = 0.167
𝑥 𝑛−1 √60 540−1
𝑛

̅ ± 𝟐. 𝟎𝟓𝝈𝒙̅= 𝟔. 𝟐 ± 𝟐. 𝟎𝟓(𝟎. 𝟏𝟔𝟕)
ii. 𝑿
Hence, the LCL and UCL are 5.86 and 6.54 respectively.
5. For the following sample sizes and confidence levels, find the approximate ‘t’ values for
constructing confidence intervals (use the ‘t’ table).
i) n = 28; 95%
ii) n = 8; 98%
iii) n = 13; 90%
iv) n = 25; 95%
Ans.5 To find the approximate 't' values for constructing confidence intervals, we can refer to
the t-distribution table. The degrees of freedom df are crucial for finding the 't' values, and they
are calculated as (df = n - 1), where (n) is the sample size.

i) (n = 28); 95%
- Degrees of freedom = 27
- From the t-distribution table, for a two-tailed test with (df = 27) and a confidence level of
95%, the 't' value is approximately 2.055.

ii) (n = 8); 98%


- Degrees of freedom (df) = 7
- From the t-distribution table, for a two-tailed test with ( df = 7) and a confidence level of
98%, the 't' value is approximately 2.364.

iii) (n = 13); 90%


- Degrees of freedom (df) = 12
- From the t-distribution table, for a two-tailed test with ( df = 12) and a confidence level of
90%, the 't' value is approximately 1.782.

iv) \(n = 25\); 95%


- Degrees of freedom (df) = 24
- From the t-distribution table, for a two-tailed test with ( df = 24) and a confidence level of
95%, the 't' value is approximately 2.064.

UNIT 9

1. For the following cases: specify which probability distribution to use in hypothesis testing:
i. H0: 𝜇 = 27, H1: 𝜇 ≠ 27, 𝑋̅ = 33, sample 𝜎 = 4, n = 25 - Normal distribution
ii. H0: 𝜇 = 98.6, H1: 𝜇 > 98.6, 𝑋̅ = 99.1, 𝜎 = 1.5, n = 50 - Normal distribution
iii. H0: 𝜇 = 3.5, H1: 𝜇 < 3.5, 𝑋̅ = 2.8, sample 𝜎 = 0.6, n = 18 – ‘t’distribution
iv. H0: 𝜇 = 57, H1: 𝜇 > 57, 𝑋̅ = 65, sample 𝜎 = 12, n = 42 - Normal distribution

2. i) Null hypothesis states that there is a significant difference between observed and
hypothetical values. (True/False)
ii) 1% level of significance means we are ready to reject a true hypothesis in 99% of cases.
(True/False)
iii) If the Null hypothesis H0: 𝜇 =𝑋̅or H0: p = ps or H0: 𝜇1 = 𝜇2 or H0: p1 = p2 then it is two-
tailed test. (True/False)
iv) If the calculated value of a statistic is not in the rejection region R, then Ho is accepted.
(True/False)
v) 1 - 𝛽 is called power of the test. (True/False)
vi) If n1 = 300, n2 = 500, 𝜇1 = 50, 𝜇 2 = 60, 𝜎1 = 10, 𝜎 2 = 12 are results of two samples taken
from two cities A and B then we test for between means under different population.
(True/False)
vii) If n < 30, then we do not apply z test unless, population S.D is known. (True/False)

3. i) ‘t’ distribution is continuous probability distribution.


ii) ‘t’ distribution’s parameter is degree of freedom.
iii) The mean and variance of the ‘t’ distribution are zero and greater than one.

UNIT 10
1. 𝑋2 – test is a non-parametric test.
2. A table with 4 rows and 2 columns has the degrees of freedom of 3.
3. 𝑋2 – test is wholly based on sample data.
4. If there are four rows and five columns in classification for 𝑋2 – test, then the number of
degrees of freedom equal to12.
5. If the calculated 𝑋2 value is less than the tabulated 𝑋2 value, then the null hypothesis is
not rejected.

UNIT 11
1. State whether the following statements are ‘True’ or ‘False’
i) Analysis variance is useful to test several means. - TRUE
ii) Another tool applied to test several means is Z/t–test. - FALSE
iii) F-ratio is always calculated with respect to mean square error. - TRUE
iv) The F-distribution curve depends on the degrees of freedom. - TRUE
v) In applying analysis of variance, the sample sizes must be equal. - FALSE
vi) In one-way ANOVA, the null hypothesis always states that all the population means are
different. - TRUE
vii) The F-statistic is the ratio of variance between the samples to the variance within the
samples. - TRUE

2. If we take only one factor and investigate the difference amongst its various categories
having numerous possible values, we are said to use
i) Two-way ANOVA
ii) One-way ANOVA
iii) Multi-way ANOVA
iv) Four-way ANOVA

3. The sum of squares for variance between samples is 8 and the sum of
squares for variance within samples is 24, then the sum of squares for
total variance is
i) 16
ii) 32
iii) 48
iv) 8

4. A test used as a test of goodness fit is


i) Chi-square test
ii) Z-test
iii) t-test
iv) u-test

5. A test used to compare the variance of the two independent samples is


i) F- test
ii) Z- test
iii) t - test
iv) u –test

UNIT 12
Calculate the required correlation coefficients.
1. i. From the following data, calculate the correlation between variables 1 and 2 keeping the
3rd constant.
r12 = 0.7; r13 = 0.6 r23 = 0.4
Ans.i The correlation between variables 1 and 2 keeping the 3rd constant is given by:
r12 − r13. r23 0.7 − 0.6.0.4 = 0.46
r12.3 = √1 − r2 √1 − r2 = 0.728 = 0.631
13 23 √1 − 0.62 √1 − 0.42

ii. Calculate r23.1 and r13.2 from the following:


r12 = 0.60; r13 = 0.51; r23 = 0.40
Ans.ii The correlation between variables 2 and 3 keeping the 1st constant is given by:
r23 − r21. r13 0.40 − 0.60 . 0.51 0.094
r23.1 = √1 − r2 √1 − r2 = = = 0.136
21 13
0.688
√1 − 0.602 √1 − 0.512
The correlation between variables 1 and 3 keeping the 2nd constant is given by:
r13 − r12. r23 0.51 − 0.60 . 0.40 0.27
r13.2 = √1 − r2 √1 − r2 = = = 0.37
0.72
12 23 √1 − 0.6 √1 − 0.4
2 2

iii. Given the zero order correlation coefficients, calculate the partial correlation between
variables 1 and 3 keeping the 2nd variable constant. Interpret your result.
r12 = 0.8; r13 = 0.6; r23 = 0.5
Ans.iii The correlation between variables 1 and 3 keeping the 2nd constant is given by:
r13 − r12. r23
r13.2 = √1 − r2 √1 − r2 =
0.6 − 0.8 . 0.5 = 0.2 = 0.39
0.51
12 23 √1 − 0.8 √1 − 0.5
2 2
2. State whether the following statements are ‘True’ or ‘False’.
i. Scatter diagram does not give us a quantitative measure of correlation coefficient. - TRUE
ii. Correlation estimates the value of one variable from the knowledge of the other. - FALSE
iii. Correlation coefficient is an absolute measure. - FALSE

3. State whether the following statements are ‘True’ or ‘False’.


i. Correlation coefficient is a geometric mean between regression coefficients. - TRUE
̅, 𝒀
ii. The regression lines always intersect at (𝑿 ̅).- TRUE
𝜎𝑥
iii. bxy = 𝑟 ⋅ . - TRUE
𝜎𝑦
iv. The higher the angle between regression coefficients, the lower is the correlation
coefficient. - TRUE

UNIT 13
State whether the following statements are ‘True’ or ‘False’.
1. Forecast is an estimate based solely on past data of the series under investigation. - FALSE
2. In time series analysis method a comparative study of variations can be made. - TRUE
3. In exponential smoothing, old observations are given increasing exponential weightage.
- FALSE

UNIT 14
1. State ‘True’ or ‘False’
i) ‘The prices of cooking oils reduce after the harvesting of oil seeds and go up after some
time’ is an example of cyclic variations in a time series. – FALSE
ii) The effect of national strikes, floods, earthquakes are examples of random variations in time
series. - TRUE

2. Fill in the following blanks.


i) A set of numerical value observed at regular interval of time is called time series.
ii) Long term movements in time series are called secular trend.
iii) Variations that occur within a year are known as seasonal variations.
iv) Semi averages method is used to measure trend.
v) Method of moving averages does not show any functional relationship.

UNIT 15
1. Find out the price index number using simple aggregate method for the data represented in
table 15.3.
Table 15.3: Price of the Commodities for Years 2001 and 2002
Commodity Price in Rs. Per Quintal
Base year 2001(p1) Base year 2002(p2)
A 80 100
B 120 250
C 100 150
D 200 300

Ans.1
Here, ∑𝒑𝟎 = 𝟓𝟎𝟎, ∑𝒑𝟏 = 𝟖𝟎𝟎
∑𝒑𝟏 𝟖𝟎𝟎
𝑷 = × 𝟏𝟎𝟎 = × 𝟏𝟎𝟎 = 𝟏𝟔𝟎
𝟎𝟏 ∑𝒑𝟎 𝟓𝟎𝟎

2. The data in table 15.10 is related to workers in an industrial town. Calculate consumer price
index number by using family budget method.
Table 15.10: Price Index and Percentage Expenditures of Items
Item of Consumption Price Index P Percentage
Expenditure
Food 200 50
Clothing 175 10
Fuel & Lighting 160 12
Housing 225 15
Miscellaneous 150 13
Ans.2
Item of Consumption Price Index P Weight W PW
Food 200 50 10000
Clothing 175 10 1750
Fuel & Lighting 160 12 1920
Housing 225 15 3375
Miscellaneous 150 13 1950
∑W = 100 ∑PW =
18995

Consumer price index number by family budget method is given by:

∑𝐏𝐖 𝟏𝟖𝟗𝟗𝟓
𝑷𝟎𝟏 = = = 𝟏𝟖𝟗. 𝟗𝟓
∑𝐖 𝟏𝟎𝟎

Hence, the consumer price index number by family budget method is


189.95.

MODEL QUESTION PAPER


SECTION A
Multiple Choice Questions (2 Marks each)

1. Which of the following statement is true?


a) Statistics enlarges physical vision
b) Statistics helps in estimation
c) Statistics quantifies uncertainty
d) Statistics is of no use to humanity.

2.The data that can be classified on the basis of time is:


a) Geographical
b) Chronological
c) Qualitative
d) Quantitative

3. In any distribution when the original items differ in size, the value of Arithmetic mean
(AM), Geometric mean (GM) and Harmonic mean (HM) would also differ in the following
order
a) AM>GM>HM
b) AM=GM=HM
c) AM<HM<GM
d) AM.GM>HM

4. Non-Sampling errors include


i)bias
ii) mistakes
iii) both bias & mistakes
iv) none of these

5) Which of the following factors does not affect the width of a confidence interval?
i) Sample size
ii) Confidence desired
iii) Variability in the population
iv) Population size

6. The sum of squares for variance between samples is 8 and the sum ofsquares for variance
within samples is 24, then the sum of squares fortotal variance is
a) 16
b) 32
c) 48
d) 8
7) The prices of cooking oils reduce after the harvesting of oil seed sand go up after some
time’ is an example of cyclic variations in a time series.
a) True
b) False

8) Which of the following is not a forecasting model?


a) Trend method
b) End-use method
c) Correlation Method
d) Exponential Method

9) What test would you use to determine whether a set of observed frequencies differ from
their corresponding expected frequencies?
a) The t test for dependent samples
b) The Chi-Square test
c) The t test for independent samples
d) The F test

10) Geographical classification means classification of data according to:


a) Location
b) Time
c) Attributes
d) Class intervals

Section B
Short Answers (5 Marks each)

a) Find the Standard Deviation of (Rs.) 7, 9, 16, 24, 26.


Ans.a Given numbers are 7,9,16,24,26
Mean of given numbers = 7 + 9 + 16 + 24 + 26 = 16.4
5
√(7−16⋅4)2+(9−16⋅4)2+(16−16⋅4)2 +(24−16⋅4)2 +(26−16⋅4)2
Standard Deviation =
5
= √58.64 = 7.65

b) The probabilities that component A and component B of a machine will fail are 0.09
and 0.06 respectively. The machine will fail if any one of them fails. Find the probability
that it will fail?
Ans.b Let P(A) be the probability that component A fails, and P(B) be the probability that
component B fails.
The probability that the machine does not fail P(Not Failing) is the complement of the machine
failing, and it is given by:

P(Not Failing)=P(A does not fail)×P(B does not fail)


Given that P(A)=0.09 and P(B)=0.06:
P (Not Failing) = (1−0.09)×(1−0.06)
P (Not Failing) = .91×0.94
P (Not Failing) = 0.8554

P(Failing) = 1−0.8554
P(Failing) = 0.1446

c) An unbiased coin is tossed six times. What is the probability that the tosses will result
in:
i) Exactly two heads
ii) At least five heads
Ans.c) P(X=k) = (nCk) ⋅ pk ⋅ (1−p)n−k

i) Exactly two heads:


P(X=2) = (6C2) ⋅ (0.5)2⋅ (1−0.5)6−2
P(X=2) = (15) ⋅ (0.25) ⋅ (0.0625)
P(X=2) = 0.234

ii. At least five heads


P(X≥5) = P(X=5) + P(X=6)
P(X≥5) = (6C5) (0.5)5 ⋅ (1−0.5)6−5 + (6C6)⋅(0.5)6⋅(1−0.5)6−6
P(X≥5) = 6⋅0.03125+1⋅0.015625
P(X≥5) =0.203

d) A production company has 350 hourly employees having average 37.6 years of age,
with a standard deviation of 8.3. If the sample average is 40 years of age and z-value is
2.07, calculate the required sample size.
Ans.d Given:
• A production company has 350 hourly employees having average 37.6years of age, with
a standard deviation of 8.3
• The sample average is 40 years of age
• The z-value is 2.07
𝑥̅−𝜇 𝜎
We know, 𝑧 = 𝑎𝑛𝑑 𝑆𝐸 =
𝑆𝐸 √𝑛
𝜎 8.3
𝑆𝐸 = =
√𝑛 √𝑛

𝑥̅ − 𝜇
𝑧=
8.3
√𝑛
40 − 37.6
2.07 =
8.3
√𝑛
(2.07)(8 ⋅ 3)
√n = = 7.15
2⋅4
𝑛 = 51.24 ≈ 51

e) Three varieties of crops ‘A’, ‘B’, and ‘C’ are tested in a randomized block design with
four replications. The yields are depicted in table 11.6. Test at 0.05 level of significance
whether there is a difference between replications. Test also whether the varieties differ
significantly. Answer the question taking a significant level of 5%.

Variety Replications
1 2 3 4
A 6 4 8 6
B 7 6 6 9
C 8 5 10 9
Ans.e N = 12, T = Sum of all values = 84
2 2
Correction Factor = T = 84 = 588
N 12 𝟐
SST (Total Sum of the Squares) = Sum of squares of all observations - 𝐓 = (62 + 42 + 82
𝐍
+ 6 + ……. 10 + 9 ) – 588 = 624 – 588 = 36
2 2 2

SSC (between consignment): 2


(∑𝑋1𝑖)2 (∑𝑋2𝑖)2 (∑𝑥𝑛𝑖)2 𝑇2 212 152 242 + 24 – 1323
SSC = [ + + ⋯+ ]− = + +
𝑛1 𝑛2 𝑛𝑛 𝑛 3 3 3 3
= 147+75+192+192 – 588 = 18

Degrees of freedom = (c-1) = (4 -1) = 3


18
MSC = 𝑆𝑆𝐶 = 18 = = 6
(𝑐−1) (4−1) 3

SSR (between students):


(∑𝑋1𝑗)2 (∑𝑋2𝑗)2 (∑𝑥𝑛𝑗)2 𝑇2 242 282 322
SSR = [ + + ⋯+ ]− = + + – 588
𝑛1 𝑛2 𝑛𝑛 𝑛 4 4 4
= 144+196+256 – 588 = 8

Degrees of freedom = (r-1) = (3-1) = 2


8
MSR = 𝑆𝑆𝑅 = =8=4
(𝑟−1) (3−1) 2

SS residual or error: SSE = SST – SSC – SSR = 36 – 18 – 8 = 10


10
𝑆𝑆𝐸
MSE = (𝑟−1)(𝑐−1) = = 1.67
6
𝑀𝑆𝐶
Since MSC > MSE we take 𝐹 = and MSR > MSE we take 𝐹 = 𝑀𝑆𝑅
𝑐 𝑀𝑆𝐸 𝑟 𝑀𝑆𝐸

𝑀𝑆𝐶 6
𝐹𝑐 = = = 3.61
𝑀𝑆𝐸 1.67
𝑀𝑆𝑅 4
𝐹𝑟 = = = 2.40
𝑀𝑆𝐸 1.67
For Replication:
The calculated value of 𝐹𝑐 is 3.61. The table value of F for (3,6) df at 5% level of significance
is 4.76. Since the calculated value of F is less than the table value, we accept the null hypothesis
and conclude that it is not significant.
For Variety:
The calculated value of 𝐹𝑟 is 2.40. The table value of F for (2,6) df at 5% level of significance
is 5.14. Since the calculated value of F is less than the table value, we accept the null hypothesis
and conclude that it is not significant.

f. What are the characteristics of an index number?


Ans.6
1. Expressed in numbers: Index numbers represent the relative changes such as increase in
production; reduction in prices etc. in the numbers.
2. Expressed in percentage: Index numbers are expressed in terms of percentages so as to
show the extent or relative change where the value of base is assumed to be 100 but the sign
of percentage (%) is not used.
3. Relative measure: Index number measure changes which are not capable of direct
measurement.
4. Specified averages: Index number represents a special case of average, in general known
as weighted average. It is a special type of average, because in a simple average, the data is
homogenous having the same unit of measurement, whereas the average variables have
different units of measurement.
5. Basis of comparison: Index numbers by their very nature are comparative. They compare
changes over time or between places or similar categories.

Section C
Long Answers (10 Marks each)

1. Distinguish between:

a) Primary and secondary data

Primary Data Secondary Data


Original data collected firsthand for a Existing data collected by someone else for a
specific purpose. different purpose.
Primary Data Secondary Data
Directly obtained from individuals, surveys, Derived from sources like books, articles,
experiments, observations, etc. databases, reports, etc.
Typically, more reliable, and accurate as it May lack the precision and relevance required
is specific to the researcher's needs. for a particular study.
Example: Surveys, interviews, experiments Example: Census data, academic papers,
conducted by the researcher. reports published by government agencies, etc.

b) Direct and indirect investigation


Direct Investigation Indirect Investigation
Involves the researcher directly gathering Involves using existing sources or
data firsthand. intermediaries to obtain information.
Researchers have more control over the Researchers have limited control as they rely on
data collection process. external sources.
Generally, more time-consuming due to Can be quicker as researchers use pre-existing
direct interaction and data collection. data or intermediaries.
Example: Surveys, interviews, experiments Example: Using third-party reports, published
conducted by the researcher. data, or hiring a data collection agency.

c) Questionnaire and schedule


Questionnaire Schedule
A written set of questions designed to gather A structured set of questions administered by
information from respondents. an interviewer.
Can be self-administered or administered by Administered by an interviewer in a face-to-
an interviewer. face setting.
Can be structured or unstructured; may Typically, highly structured with
include open-ended questions. predetermined questions and response options.
Offers more flexibility in terms of when and Less flexibility as it requires an interviewer to
where respondents can answer. administer.

2. A factory has three machines M1, M2 and M3. They produce 4000, 10,000 and 6,000
products per day. From past records, it is known that M1, M2, and M3 produce 5%, 4%,
and 8% defectives. A product is selected at random from the day’s production and is
found to be defective. What is the probability that it was not produced by machine M3?
Ans.2
Calculating the total number of defectives:
Total defectives = (Defectives from M1) + (Defectives from M2) + (Defectives from M3)
Total defectives = (0.05 * 4000) + (0.04 * 10000) + (0.08 * 6000)
Total defectives = 200 + 400 + 480
Total defectives = 1080

Probability of a defective product not being produced by M3:


Probability = (Defectives not from M3) / (Total defectives)
Probability = (Total defectives - Defectives from M3) / (Total defectives)
Probability = (1080 - 0.08 * 6000) / 1080
Probability = (1080 - 480) / 1080
Probability = 600 / 1080
Probability ≈ 0.5556

Therefore, the probability that a defective product was not produced by machine M3 is
approximately 0.5556 or 55.56%.

3. In a very large organisation, the director wanted to find out what proportions of the
employees prefer to provide their own retirement benefits in lieu of a company –
sponsored plan. A simple random sample of 75 employees was taken. It was found that
40%, that is, 0.4 of them are interested in providing their own retirement plans. The
management requests that we use this sample to find an interval about which they can
be 99 percent confident that it contains the true population proportion.
Ans.3
𝑝(1−𝑝)
Confidence Interval = 𝑝 ± 𝑧critical × √ 𝑛

Given:
- Sample proportion (𝑝) = 0.4
- Sample size (n) = 75
- Confidence level = 99% (z-critical value = 2.576)

Now, putting the values:


0⋅4(1−0⋅4)
Confidence Interval = 0.4 ± 2.576 × √
75
0.24
Confidence Interval = 0.4 ± 2.576 × √
75

Confidence Interval = 0.4 ± 2.576 × √0.0032

Confidence Interval = 0.4 ± 2.576 × 0.0566

Confidence Interval = 0.4 ± 0.1459

Confidence Interval = (0.2541, 0.5459)

Therefore, the 99% confidence interval for the true population proportion is approximately
(0.2541, 0.5459).
4. Calculate Spearman’s rank correlation coefficient between the series A and B depicted
in table:

Series A 57 59 62 63 64 65 55 58 57
Series B 113 117 126 126 130 129 111 116 112
Ans.4

Series 1 Rank 1 Series 2 Rank 2 D = R1 - R2 D2


57 7.5 113 7 .5 0.25
59 5 117 5 0 0
62 4 126 3.5 .5 0.25
63 3 126 3.5 -.5 0.25
64 2 130 1 1 1
65 1 129 2 -1 1
55 9 111 9 0 0
58 6 116 6 0 0
57 7.5 112 8 -.5 0.25
∑D = 0 ∑ D2 = 1
Here number 57 is repeated twice in series 1 and number 126 is repeated twice in series 2.
Therefore, in 1, m = 2 and in 2, m = 2.
1 1
6 (∑𝐷2 + (𝑚3 − 𝑚 ) + (𝑚3 − 𝑚 ))
12 1 1 12 2 2
𝑅 =1−
𝑁3 − 𝑁
1 1
6 (1 + (8 − 2) + (8 − 2))
12 12 6(2)
𝑅 =1− =1− = 1 − 0.016 = 0.984
93 − 9 729

ASSIGNMENT QUESTIONS
SET -1

1 Define statistics. Explain various functions of statistics. Also discuss the key limitations
of statistics.
Answer 1: The study of statistics is a subfield of mathematics that deals with data gathering,
analysis, interpretation, presentation, and organisation. It offers techniques for condensing and
characterising different facets of data, facilitating the development of insightful findings and
conclusions. Descriptive statistics entail summarising and presenting data, while inferential
statistics include drawing conclusions or predictions about a population from a sample.
Together, these two types of statistics are included in statistics. Statistics is essential for
research, decision-making, and comprehending patterns and trends in datasets in a variety of
disciplines, including science, business, economics, and social sciences.

Functions of Statistics

Description of data: A meaningful and compact description and summarization of data


depends heavily on statistics. Statisticians offer an extensive summary of the primary attributes
and features present in a dataset using different metrics, including mean, median, and standard
deviation. To understand the crucial components of the data one have to deal with, researchers,
analysts, and decision-makers need to be able to perform this descriptive role.

1. Inference: One important use of statistics is inference, which is the process of


estimating or inferring characteristics of a population from a sample. This enables
analysts to extrapolate their results beyond the sample under observation, offering
perspectives on more general patterns and trends. Decision-makers can use statistical
inference to forecast future behaviours or events and make well-informed decisions.

2. Comparison: Comparing several groups, variables, or datasets is made easier


using statistics. By using methods like t-tests and analysis of variance (ANOVA),
analysts can find patterns, distinctions, and connections among different data items.
When making decisions, it's important to recognise differences as well as similarities,
which is why comparisons are so important.

3. Exploration: Exploration is the process of locating hidden patterns, connections,


or anomalies in a dataset. Researchers can better examine the data by using statistical
techniques including the correlation analysis, clustering, and data visualisation. This
role is crucial for coming up with theories and directing future research into possible
discoveries that might not be obvious at first.

4. Prediction: Based on past data, prediction uses statistical models to project future
trends or results. Predictive modelling frequently makes use of regression analysis and
time series analysis. Analysts can assist in planning and strategic decision-making by
using patterns in historical data to inform their projections.

Limitation of Statistics

1. Qualitative Data Exclusion: Statistics deals mostly with quantitative data, such
as numerical values and quantifiable quantities. Qualitative data, which refers to
traits, meanings, or attributes, is not directly analysed statistically. Inherently
qualitative phenomena such as beauty, intelligence, or emotions present difficulties
for direct statistical quantification.
2. Limitation with individual facts: Applying statistical approaches to groupings
or aggregation of data, as opposed to individual facts, yields better results. The
complexity of examining and evaluating specific cases poses difficulties, and
statistical methods could not produce significant insights on their own. Dealing with
patterns and trends seen among collections of data points is where statistics excels.

3. Probabilistic Nature of Statistical Inferences: Statistical inferences are


probabilistic in nature, providing insights at the average or aggregate level. While
statistical evaluations reveal useful trends, estimates, and relationships, they do not
provide precise results that can be applied uniformly. In a dataset of the heights, for
example, an inference about average height may not precisely indicate the specific
height of a person inside that sample.

4. Potential for misuse and misunderstanding: It is possible for statistics to be


misused or misunderstood, especially when people don't fully comprehend the
concepts behind it. Biased data collection, poor methodology, or improper use of
statistical models can all lead to inaccurate conclusions. A increasing scepticism and
mistrust of statistical results has been caused by this misuse of data.

5. Challenges for Non-experts: Because of its complexity, statistics demands a


particular level of knowledge to be applied effectively. It could be difficult for
normal individuals without prior experience in statistics to accurately grasp
statistical concepts, procedures, and analyses. As statistical approaches are complex,
effective and meaningful data interpretation requires the expertise of qualified
professionals, most commonly statisticians.

2. Define Measurement Scales. Discuss Qualitative and Quantitative data in detail with
examples.
Answer 2: Measurement Scale
The term “measurement scale,” which can also be used to refer to a scale of measurement or
degree of measurement, describes how variables are categorised or classified according to the
properties and nature of the data they represent. Measurement scales define the kinds of
statistical studies that can be performed on variables and offer a framework for comprehending
their characteristics.
1. Qualitative Data: Qualitative data refers to non-numerical data that characterises the
traits, features, or attributes of a subject. This kind of data is categorical, meaning it represents
different labels or categories. When classifying and categorising information, qualitative data
is frequently utilised instead of numeric observations because it is based on intrinsic qualities.
The two primary categories of qualitative data are ordinal and nominal.
a. Nominal Data: These are categorical data, which lack any sort of natural order or
ranking and instead demonstrates several categories or groups. These divisions are
separate and incompatible. Without assuming any quantitative relationship, nominal
data just enable category identification and distinction. Nominal data examples include:
i. Shades of colour, such as yellow, pink, white.
ii. Types of animals, such as birds, dogs, and cats.

b. Ordinal Data: Ordinal data are categorical data as well, but they are distinguished from
other types of data by a significant ranking or order. The gaps between the categories
are not consistent or measurable, despite the significance of the order. Ordinal data
make it possible to compare relative positions, however rank differences aren't always
equal. Ordinal data examples include:
i. Levels of education (such as graduate, college, or high school).
ii. Ratings of customer satisfaction (such as Excellent, Good, and Fair)

2. Quantitative Data: Quantitative data refers to numerical data that has measurable
values and can be stated numerically. With the use of mathematical computations and
statistical analysis, this kind of data offers a more thorough and accurate consideration of a
phenomenon. The two primary categories of quantitative data are interval and ratio data.
a. Interval Data: Quantitative data with equal intervals between values but no genuine 0
point on the scale is known as interval data. Stated differently, a zero value does not
imply the lack of the measured property, even though the numerical disparities between
values have meaning. Temperature readings in Celsius or Fahrenheit are typical
instances of interval data. On the Celsius scale, 0 denotes a particular point rather than
the whole absence of temperature.
b. Ratio Data: Quantitative data in the form of ratios is defined by uniform spacing
between values and a real zero point. The presence of the measured property is implied
by a zero value in a true zero point. Ratio comparisons between values can now be
meaningfully performed. A 0 value denotes the total absence of the attribute being
measured, and ratio data examples include height, weight, income, age, and other
metrics. Consider the non-existence of a person with a height of zero.

3. Discuss the basic laws of Sampling theory. Define following Sampling techniques with
help of examples:
Stratified Sampling
Cluster Sampling
Answer 3:

Laws of Sampling Theory


Reliability and validity of results are ensured by following the laws and principles of sampling
theory, which offer instructions for properly conducting statistical sampling. The concept that
are explained are as follows:
1. Law of statistical regularity: According to this law, features seen in large, random
samples are likely to mirror traits associated with the population where the sample was
taken. It highlights how, when working with sufficiently big and random samples,
statistical patterns can be stable and predictable.
2. Principle of inertia of large numbers: According to the law, a sample's features
converge towards the actual characteristics of the population as sample size grows.
Larger samples lessen the effect of random changes and are more likely to yield precise
as well as representative insights.

3. Principle of persistence of small numbers: This approach recognises that there


may be a great deal of variability in small samples and that the observed features may
differ greatly from the population characteristics. It emphasises the need to exercise
caution when extrapolating conclusions from small samples since the population as a
whole may not be accurately represented by them.

4. Principle of Validity: Ensuring that a sampling method accurately measures what


it seeks to measure is emphasised by the principle of validity. The reliability and
applicability of the findings depend heavily on validity. Reputable sampling techniques
support the goals of the study and offer insightful information about the community
being sampled.

5. Principle of Optimisation: In order to get accurate data, this principle promotes


the use of efficient and affordable sampling techniques. It entails finding the best
possible balance between the study’s resources and sample size. Researchers can
maximise the accuracy of their estimates without expending needless resources by
striking the correct balance.

Sampling Techniques

1. Stratified Sampling: Using stratified sampling, the population is split up into


discrete subgroups or stratum according to particular attributes. Next, samples are drawn
at random from each stratum according to how represented they are in the total
population. For example, age-based strata (such as 18–25, 26–35, and 36–45) could be
established for a market research study on smartphone preferences. To guarantee a
representative sample, samples would be chosen at random from each age group.

2. Cluster Sampling: To do cluster sampling, the population is divided into groups


or clusters, and then complete clusters are chosen at random to be included in the sample.
When meaningful units are represented by the clusters themselves, it is especially
helpful. For example, a researcher conducting an ecological study may choose particular
habitats (clusters) at random and investigate every kind of plant and animal found there.

3. Multi-stage Sampling: In multi-stage sampling, various sampling techniques are


combined in phases. Usually, larger clusters or groups are chosen first, and then random
sampling is carried out within those clusters or groups. For example, in a multi-country
political poll, the first step may be the random selection of nations (clusters), and the
second step could be the random selection of certain regions within each selected
country. In the third stage, houses within the designated regions are finally randomly
sampled.

SET – 2
4. Define Business Forecasting. Explain various methods of Business Forecasting.
Answer 4:
USINESS FORECASTING
Business Forecasting is the methodical process of forecasting future trends, events, or business
results using a variety of quantitative and qualitative techniques, historical data, and analysis.
Helping businesses plan for the future, make educated decisions, and adjust to expected
changes is the aim of business forecasting. Forecasting future demand for goods and services
entails assessing market trends, projecting financial results, and taking potential effects on the
business environment into account. Methods of Business Forecasting
1. Business Barometers: Using business barometers means keeping an eye on and
evaluating leading signs, or economic factors, which typically shift before the economy
as a whole. For example, to predict changes in the economy and make wise choices
about inventory control and marketing tactics, a retail company may monitor consumer
confidence indexes, interest rates, and stock market patterns.

2. Time Series Analysis: Examining past data points successively across time in
order to spot trends, patterns, and seasonality is known as time series analysis. In order
to better accurately estimate future production needs, a manufacturing corporation, for
example, could analyse quarterly production data spanning several years in order to
discover repeating patterns.

3. Extrapolation: It is a technique for forecasting that, without taking underlying


factors into account, extrapolates past patterns or trends into the future. Assuming that
past trends will continue, one way to forecast future sales growth would be to follow the
average percentage rise from the prior years.

4. Regression Analysis: It examines and measures influencing factors through


analysing the link between a dependent variable and one or more independent variables.
Regression analysis is one tool that a technology business may use to forecast future
sales based on variables such as marketing costs, product attributes, and prevailing
economic conditions.
5. Modern Econometric Methods: It involve modelling and analysing intricate
correlations in economic data using cutting-edge statistical tools. Using econometric
models to estimate inflation rates, for instance, would provide a more sophisticated view
of economic dynamics by taking into account a variety of economic variables and their
interdependencies.

6. Exponential Smoothening Method: It is a time series forecasting technique that


pays more weight to recent observations by assigning various weights to recent data
points. In order to estimate weekly sales, a retail chain, for instance, would employ
exponential smoothing. This would allow it to catch changing customer trends by
reducing the importance of earlier data points and emphasising recent sales
performance.

5. What is Index number? Discuss the utility of Index numbers.


Answer 5: Index Number
Index number is a metric that expresses the relative movement or change in a set of variables,
usually in relation to a certain base value or base period. Index numbers are used to provide a
standardised measure that makes it easier to compare data from various categories, locations,
or time periods. They also help to simplify and make complex data more interpretable. It is
simpler to identify patterns and variances in the data when other values are compared to the
base value, which is often set at 100.

Utility of Index Number

Index numbers are useful in a variety of industries and have multiple uses. Several important
facets of their usefulness are as follows:
1. Comparison of Time Periods: Index numbers make it possible to compare
variables or data points from various time periods. Analysts can evaluate the
proportional changes or trends over time by defining a base period or base value.

2. Comparative analysis: Through the provision of a standardised metric, they


enable comparison analysis. Index numbers provide an evaluation framework that is
typical for evaluating the performance of various locations, industries, or categories.

3. Inflation and Price Changes: Inflation and price fluctuations are commonly
gauged using index numbers. Examples that quantify variations in the cost of living and
manufacturing costs, respectively, are the Consumer Price Index (CPI) and the Producer
Price Index (PPI).

4. Economic Indicators: Index numbers are frequently used in leading economic


indicators, such as stock market indexes and economic performance indicators. These
metrics direct decision-making and offer a quick overview of the state of the economy.

5. Performance Measurement: Index numbers are used by businesses to evaluate the


success of their stocks, goods, and services. Indexes of stocks, like the S&P 500, are
used as standards to assess the performance of the market as a whole.

6. Discuss various types of Estimators. Also explain the criteria of a good estimator.
Answer 6: TYPES OF ESTIMATORS
The following are the two types of estimators:
1. Point Estimator: An estimator that gives a precise, single-value estimate for the
parameter of interest is called a point estimator. With the goal of coming as close as
feasible to the actual value of the population parameter, it reduces the data from a sample
to a single numerical value. Sample proportion, sample variation, and sample mean are
typical examples. Point estimators lack information regarding the estimate's variability
or dependability, despite being simple and easy to understand.

2. Interval Estimate: An interval estimator, sometimes referred to as a confidence


interval, gives both a degree of confidence and a range of values that the true value of
the attribute is likely to fall inside. Interval estimators, in contrast to point estimators,
provide a range of possible values as opposed to a single estimate. An interval estimate
with a corresponding confidence level, like a 95% confidence interval, is how the
confidence interval is expressed. This method provides an understanding of the accuracy
and uncertainty related to the estimating procedure. When evaluating the validity of
point estimates and communicating the possible variability in the population parameter,
interval estimators are especially helpful.

Criteria for a Good Estimator


1. Unbiasedness: An estimator is deemed impartial if, on average, it produces
estimates that match the true population parameter. In other words, the estimator's
expected value and the parameter's true value are same. Biased estimators improve the
estimating process's reliability by lowering systematic errors and ensuring that the
estimator remains centred around the right value during multiple samplings.

2. Efficiency: An estimator's efficiency is determined by how well it makes use of


the data that is available from the sample. A reduced variance in an efficient estimator
indicates less unpredictability in its estimates. An efficient estimator reduces the
variation among unbiased estimators in the framework of mean squared error, which
makes it a more accurate and dependable instrument for parameter estimation.

3. Sufficiency: All the parameter information provided by the sample is contained


in a sufficient estimator. Stated differently, once a statistic is established, more data from
the sample does not reveal anything new about the parameter. When the estimator is
sufficient, it avoids duplication and gathers all pertinent data, resulting in more efficient
and insightful estimating processes.

4. Consistency: It refers to the characteristic whereby the estimator narrows to the


true parameter value as the sample size grows. To put it simply, the accuracy of a
consistent estimator increases with the number of data points it contains. The reliability
of the estimating process is enhanced by consistency, which guarantees that the
estimator develops a more credible tool for estimating the genuine population parameter
with bigger sample.

You might also like