SM For Statistics
SM For Statistics
Accounting
Public accounting firms use statistical sampling procedures when conducting audits for their
clients.
Finance
Financial advisors use a variety of statistical information to guide their investment
recommendations.
Marketing
Electronic scanners at retail checkout counters are being used to collect data for a variety of
marketing research applications.
Production
Today’s emphasis is on quality. Quality is of utmost importance in production. A variety of
statistical quality control charts are used, to monitor the average output of a production
process.
Economics
Economists are frequently asked to provide forecasts about the future of the economy. They
use a variety of statistical information in making such forecasts. For example, in forecasting
inflation index, economists use statistical information on indicators such as the producer index,
the unemployment rate and manufacturing capacity utilisation.
In the field of medicine, statistical tools like t-tests are used to test the efficiency of the new
drug or medicine. In the field of economics, statistical tools such as index numbers, estimation
theory and time series analysis are used in solving economic problems related to wages, price,
production and distribution of income. In the field of agriculture, an important concept of
statistics such as analysis of variance (ANOVA) is used in experiments related to agriculture,
to test the significance between two sample means.
Statistics is a part of Economics, Commerce and Business. Statistical analysis of the variations
in price, demand and production are helpful to both businessmen and economists. Cost of
living index numbers help governments in economic planning and fixation of wages. A
government’s administrative system is fully dependent on production statistics, income
statistics, labour statistics, economic indices of cost, and price. Economic planning of any
nation is entirely based on the statistical facts. Cost of living index numbers are also used to
estimate the value of money. In business activities, analysis of demand, price, production cost,
and inventory costs help in decision making.
UNIT 2
2. Enumerate the factors which should be kept in mind for proper planning.
Ans.2
1. Objective and Scope:
- Clearly define survey purpose and objectives.
- Determine the scope of the study.
2. Population and Sampling:
- Identify target population.
- Choose a representative sample.
3. Sampling Size:
- Determine an appropriate sample size for significance.
4. Sampling Frame:
- Develop a comprehensive list of the population.
5. Survey Design:
- Choose appropriate survey methods.
- Design clear and unbiased questions.
6. Data Collection Tools:
- Select suitable data collection tools.
- Train surveyors for consistency.
7. Pilot Testing:
- Test survey instruments for issues.
- Evaluate clarity and effectiveness.
8. Data Analysis Plan:
- Determine analysis methods.
- Plan for handling incomplete data.
9. Timeline:
- Develop a realistic survey timeline.
10. Budget:
- Estimate financial resources required.
11. Ethical Considerations:
- Ensure compliance with ethical standards.
- Address potential biases.
12. Quality Control:
- Implement measures for data quality.
- Include validation checks.
13. Data Security:
- Protect confidentiality and security of data.
14. Reporting and Dissemination:
- Plan for presenting and disseminating results.
- Tailor reporting methods for the audience.
4. Distinguish between:
a) Primary and secondary data
b) Direct and indirect investigation
c) Questionnaire and schedule
Ans.4 a) Data collected for the first time by the investigator is primary data. Data collected by
some other persons but used by the investigator for his/her study is known as secondary data.
b) Direct investigations are carried out directly by the investigator. Investigation conducted
through mail questionnaire is called indirect investigation.
c) Questionnaires contain simple questions and are filled by respondents. Schedules also
contain questions, but responses are recorded directly by the investigator.
UNIT 3
1. Form frequency distribution for the following data regarding weight of 50 people.
Table 3.48: Data regarding weight of 50 people
50 72 61 64 72 62 61 56 75 55
52 71 54 64 71 64 59 59 70 54
60 60 57 57 66 68 60 62 68 54
62 65 58 64 65 60 60 67 58 56
70 62 60 68 64 62 59 69 52 58
2. Junior executive of XYZ Company has prepared budget for a new division of the
company. Table 3.49 depicts the budget data. Vice president of the company wanted to
see the summary of the budget in a diagrammatic form. Prepare a pie diagram.
Table 3.49: Budget of XYZ Company
Ans.2
Capital Investment
Raw material
R&D
Misc.
3. ABC Ice Cream Company attempts to keep all its ten flavours of ice cream in stock at
each of its stores. In-charge of stores operation collects data on the daily amount of each
flavour to the nearest half gallon.
4. Table 3.50 depicts certain data. Construct histogram for this data.
Ans.4 Below figure depicts a histogram diagram for the given data:
5. Association of real estate sellers has collected data on a sample of 100 people with
respect to the monthly commission earned by them. Table 3.51 depicts certain data.
Construct an ogive. Find:
i. What proportion of salespeople earn more than 25,000
ii. What proportion earn between 15,000 and 25,000.
Ans.5
Earnings Frequency Cumulative
Frequency
>5000 5 5
>10000 9 14
>15000 13 27
>20000 30 57
>25000 27 84
>30000 16 100
TOTAL 100
Chart Title
120
100
80
60
40
20
0
1 2 3 4 5 6 7
i. 16%
ii. 57%
UNIT 4
1. In an office there are 84 employees. Their salaries in Indian rupees are as given in
table. Find the mean salary per day.
Table 4.60: Salaries of 84 Employees
Salary/Day 60 70 80 90 100 120
Employees 3 5 8 10 4 2
2.A survey of 128 smokers gave the results represented in table 4.61, which are frequency
distributions of smokers’ daily expenses on smoking. Find the mean expenses and
standard deviation. Determine coefficient of variation.
Table 4.61: Survey Results of 128 Smokers
Ans.2
Class Frequency Mid- 𝒇𝑿 Deviation D2 𝒇. 𝑫2
Interval (𝒇) point (X) (D=X-𝑿 ̅)
10-20 23 15 345 -16.64 276.88 6368.46
20-30 44 25 1100 -6.64 44.08 1939.52
30-40 35 35 1225 3.36 11.28 394.8
40-50 12 45 540 13.36 178.48 2141.76
50-60 9 55 495 23.36 545.68 4911.12
60-70 3 65 195 33.36 1112.88 3338.64
70-80 2 75 150 43.36 1880 3760
TOTAL 𝜮𝒇= 128 ∑ 𝒇𝒙 =4050 ∑ 𝒇𝑫2 = 22854.3
∑ 𝑓𝑥 4050
Mean = = = 31.64
𝛴𝑓 128
𝛴𝑓⋅𝐷2 22854.3
Standard Deviation =√ =√ = √178.54 = 13.36
𝛴𝑓 128
3. For the distribution shown in table 4.62, find the median and mode.
% Marks 0-10 10-20 20-30 30-40 40-50 50-60 60-70
No. of Smokers 4 9 19 20 18 7 8
No. of Cumulative
% Marks Mid-point(x) f.x
Smokers(𝒇) Frequency (cf)
0-10 4 4 5 20
10-20 9 13 15 135
20-30 19 32 25 475
30-40 20 52 35 700
40-50 18 70 45 810
50-60 7 77 55 385
60-70 8 85 65 520
TOTAL N= 𝜮𝒇= 85 ∑ 𝒇𝒙 =3045
Ans.3
Now, N=85, 𝑁 = 42.5
2
The cumulative frequency just greater than 42.5 is 52, and the corresponding class is 30-40.
Therefore, l=30, h=10, N=85, f=20, and cf=32.
(𝑁−𝐶𝑓) 42⋅5−32
2
Median= 𝑙 + × ℎ = 30 + ( ) × 10 = 30 + 5.25 = 35.25
𝑓 20
Mean = 35.82
Mode = 3(median) – 2(mean)
= 3(35.25) – 2(35.82)
= 34.11
4. Find the geometric mean of the following distribution given in table.
X 110 115 118 119 120
f 4 11 21 6 2
Ans.4
x 𝒇 log(x) 𝒇log(x)
110 4 2.0413 8.1652
115 11 2.0606 22.6666
118 21 2.0718 43.5078
119 6 2.0755 12.453
120 2 2.0791 4.1582
N= 44 ∑ 𝒇𝒍𝒐𝒈(𝒙) =90.9508
𝛴𝑓 log 𝑥
G.M = Antilog { }
𝑁
90.9508
= Antilog { }
44
= Antilog {2.067}
= 116.7
5. Find the harmonic mean of the following distribution given in table.
𝒙 121 122 123 124 125
f 5 25 36 37 20
Ans.5
𝒇
𝒙 𝒇
𝒙
121 5 0.0413
122 25 0.2049
123 36 0.2926
124 37 0.2938
125 20 0.16
∑ 𝒇 = 0.9926
TOTAL 𝜮𝒇= 123 𝒙
Harmonic Mean = 𝑁
𝑓
∑(𝑥)
123
= 0.9926
= 123.91
6.Given that, sum of upper and lower quartiles is 122 and their difference
is 23; find the quartile deviation of the series.
Ans.6 Quartile Deviation (Q.D) = (Q3 – Q1) / 2
= 23 / 2
= 11.5
7. If Coefficient of Variation = 22 and S.D. = 4, find the mean.
Ans.7 Given, Coefficient of variation = 22
Standard Deviation = 4
We know, coefficient of Variation = 𝑆.𝐷 𝑥 100
𝑀𝑒𝑎𝑛
22 = 4
𝑥 100
𝑀𝑒𝑎𝑛
Mean = 400 / 22
Mean = 18.18
8. The table shows the distribution of age at the time of first delivery of 65 women. Find
mean deviation from mean and median.
UNIT 5
1. Define independent events.
Ans. 1 Two events are said to be independent of each other if the occurrence of one is not
affected by the occurrence of other or the occurrence of one does not affect the occurrence of
the other.
Example: If we roll a die twice, the outcome of the first roll and second roll has no effect on
each other – they are independent.
2.The probability of Mr. Sunil solving a problem is ¾. The probability of Mr. Anish
solving is ¼. What is the probability that a given problem will be solved?
Ans.2
Probability of Mr. Sunil solving a problem P(A) = 3/4
Probability of Mr. Anish solving a problem P(B) = 1/4
P (𝐴̅̅) = 1/4
P (𝐵̅) = 3/4
Required Probability = P (A B)
= 1 – P (𝐴̅̅) P (𝐵̅)
= 1 – (1/4).(3/4)
= 1 – 3/16
= 13/16
3. The probability that a contractor will get an electrical job is 0.8, he will get a plumbing
job is 0.6 and he will get both 0.48. What is the probability that he gets at least one job?
Are the probabilities of getting electrical job and plumbing job independent?
Ans.3Let,
A: Contractor gets a plumbing contract
B: Contractor gets an electrical contract
Then, P(A) = 0.8 P(B) = 0.6 and P (A B) = 0.48
Yes, the probabilities of getting electrical job and plumbing job independent.
4. A box contains 4 red and 5 blue similar rings. What is the probability of selecting at
random two rings:
i. having same colour
ii. having different colours
Ans.4
i. Probability of selecting 2 red rings: 4C2/9C2
4! 4𝑥3
2!(4−2)! 2𝑥1
= = = 6/36 = 1/6
9! 9𝑥8
2!(9−2)! 2𝑥1
6. The probability that a company A will survive for 20 years is 0.6. The probability that
its sister concern will survive for 20 years is 0.8. What is the probability that at least one
of them will survive for 20 years?
Ans.6
A: Company A will survive for 20 years
B: Company B will survive for 20 years
Then, P(A) = 0.6 P(B) = 0.8 and P (A B) = 0.48
7. A recently developed car has two important components A and B. The probability of
failure of A and B are 0.2 and 0.1. What is the probability that the car will fail?
Ans.7
A: Failure of Component A
B: Failure of Component B
Then, P(A) = 0.2 P(B) = 0.1 and P (A B) = 0.02
8. The probability that a football player will play on ordinary ground is 0.6 and on green
turf is 0.4. The probability that he will get knee injury when playing an ordinary ground
is 0.07 and that on green turf is 0.04. What is the probability that he got a knee-injury
due to the play on ordinary ground?
Ans.8
P(OG) = 0.6
P(GT) = 0.4
P (OG| Knee Injury) = 0.07
P (GT| Knee Injury) = 0.04
P (Knee Injury |OG) = P(OG)*P (OG| Knee Injury) / P(OG)*P (OG| K Inj.) + P (GT| Knee
Inj.)
= (0.6 * 0.07) / ((0.6*0.07) + (0.4*0.04))
= 42 / 58
= 21 / 29
UNIT 6
1. What are the assumptions under which binomial distribution is applied?
Ans.1 The following are assumptions under which a Binomial distribution can be applied:
i) The outcome of an experiment should be of dichotomous nature. In the Bernoulli process,
there must be only two possible outcomes on each trial, such as ‘success’ or ‘failure’, ‘yes’ or
‘no’, ‘defective’ or ‘not defective’, ‘male’ or ‘female’, ‘pass’ or ‘fail’, ‘favourable’ or
‘unfavourable’, etc. In this experiment, the probability of success is denoted by ‘p’ and
probability of failure is denoted by ‘q’.
ii) The probability of success should remain the same across the experiments. Irrespective of
the number of times the experiment is conducted, the probability of success should be same
for all the trials of the experiment. For example, the probability of getting a head is always 0.5
irrespective of the number of times a fair coin is tossed.
iii) Experiments should be conducted under identical conditions. There should not be any
change in conditions while conducting binomial experiments. Any change in conditions only
leads to incorrect conclusions for the given experiment.
iv) Experiments should be statistically independent. We can apply a Binomial distribution only
when the events in an experiment are statistically independent, which means occurrence of
one event does not affect the occurrence of other event.
2. Find P(X = 2), given mean and standard deviation of the binomial distribution are 4
and 3 respectively.
Ans.2 Given, Mean (np) = 4; npq = 3
𝑣ariance 3
𝑝=1− =1− =1
mean 4 4
q= 3
4
n = 16
The probability mass function (PMF) for a binomial distribution:
P(X = k) = (nCk) . pk . qn-k
For, P(X=2)
P(X=2) = (16C2) . (1/4)2 . (3/4)14
= 16
( C2) . (0.75)2 . (0.25 )14
4. Give real life examples of Poisson variate.
Ans.4 The number of customers arriving at a bank or a service counter in a fixed time.
Assuming customers arrive independently and at a constant average rate, the Poisson
distribution can help estimate the probability of a specific number of arrivals.
5. If the first two terms of a Poisson distribution are 150 and 90, find P (X = 0).
Ans.5 The probability mass function (PMF) of a Poisson distribution is given by:
−𝜆⋅𝜆𝐤
P(X=k) = ⅇ Where X is the random variable, k is the number of events, and 𝜆 is the average
𝐤!
rate of events.
Now, first two terms given are 150 and 90, so
The first term (k=0) gives us:
𝑒−𝝀 ⋅ 𝝀𝟎
𝑷(𝑿 = 𝟎) = = 𝑒−𝝀 = 𝟏𝟓𝟎
𝟎!
The second term (k=1) gives us:
𝑒−𝝀 ⋅ 𝝀𝟏
𝑷(𝑿 = 𝟏) = = 𝝀 . 𝑒−𝝀 = 𝟗𝟎
𝟏!
By dividing both the equations we get,
𝝀 .𝑒 −𝝀 𝟗𝟎
=
𝑒−𝝀 𝟏𝟓𝟎
𝟑
𝝀 = = 𝟎. 𝟔
𝟓
-0.6
𝑃(𝑋 = 0) = 𝑒 = 𝑒 = 0.5488.
−λ
6. The average number of phone calls at a booth per hour is 2. What is the probability
that there will be exactly one call in an hour?
−𝜆⋅𝜆𝐤
Ans.6 Using PMF, P(X=k) = ⅇ . Given, k = 1; 𝜆 = 2
𝐤!
ⅇ−2 ⋅ 21 2ⅇ−2 2 2
(X = 1) = = = 2= = 0.2707
1! 1 ⅇ 7.3891
7. The probability that a firm’s product will succeed its competitor’s product is 2/3. If in
a month it has introduced 4 products, what is the probability that:
i) Two products succeed the competitor’s product?
ii) All products succeed the competitor’s product?
Ans.7
P (Firm’s product will succeed its competitor’s product) = 2/3
P (Firm’s product will not succeed its competitor’s product) = 1/3
8. Mean life of electric bulbs produced by a company is 1500 hours with a standard
deviation of 300 hours. Assuming that the life of bulbs follows normal distribution,
what is the probability that a randomly selected bulb will:
i) Fail within 1200 hours?
ii) Survive between 1350 and 1650 hours?
iii) Survive beyond 1950 hours?
Ans.8 Certainly, let's go through the calculations again:
Given:
Mean ()= 1500 hours
Standard Deviation ()= 300 hours
ii) Probability that a Bulb Survives between 1350 and 1650 Hours:
𝑥 − 𝜇 1350 − 1500
𝑧= = = −0.5
𝜎 300
𝑥 − 𝜇 1650 − 1500
𝑧= = = 0.5
𝜎 300
The standard normal distribution table values for Z1350 and Z1650 are approximately 0.3085 and
0.6915, respectively. The difference between these values is approximately 0.6915 - 0.3085 =
0.3830.
The standard normal distribution table value for Z = 1.5 is approximately 0.9332. To find the
probability of surviving beyond 1950 hours, we need to subtract this value from 1:
P (Survive beyond 1950 hours) = 1 - 0.9332 = 0.0668
9. The height of students follows Normal distribution. 15% of them have height less
than 150 cm and 10 % have height above 180 cm. Find the mean and standard deviation
of the distribution?
Ans.10
Given:
1. (P (X < 150) = 0.15) (15% have height less than 150 cm)
2. (P (X > 180) = 0.10) (10% have height above 180 cm)
Using Z-scores:
𝜇 = 150 + 1.04 ⋅ 𝜎
2. For the second condition:
180 − 𝜇
.28 =
𝜎
2.32 ⋅ 𝜎 = 30
30
𝜎= = 12.93
2.32
UNIT 7
1. Discuss the errors that arise in statistical survey.
Ans.1 There are four types of error:
1. Sampling errors: The sample results are bound to differ from population results, since
sample is only a small portion of the population. It is also known as inherent error and cannot
be avoided. It is not worth to eliminate them completely. These errors may be due to
the following factors:
• Faulty selection of sample
• Substitution of units to be studied
• Faulty demarcation of sampling units
• Error due to bias in estimation
However, the sampling errors follow random or chance variations and tend to cancel out
each other on averaging.
2. Non-sampling errors: Non-sampling errors are attributed to factors that can be controlled
and eliminated by suitable actions. They are due to the following factors:
• Faulty planning, faulty definitions
• Defective methods of interviewing
• Personal bias of investigator
• Lack of trained and qualified investigators
• Respondents failure to answer.
• Improper coverage
• Compiling errors
• Publication errors
It is worth to eliminate these errors.
3. Biased errors: Biased errors arise in both census and sampling methods. These error occur
due to personal bias of the investigator and the instruments used for measuring. They are also
due to faulty collection of data, respondent’s bias and bias due to non-response. Biased errors
have a tendency to grow with sample size. Therefore, they are also known as cumulative errors.
The magnitude of biased errors is directly proportional to the sample size.
4. Unbiased errors: The errors that are due to over-estimation and under-estimation, such that
they are equal are known as unbiased errors. They are also known as compensatory errors.
They do not increase with sample size.
• Lottery method – In lottery method, we identify each and every unit with distinct numbers
by allotting an identical card. The cards are put in a drum and thoroughly shuffled before each
unit is drawn. The figure depicts a lotto machine through which samples can be selected
randomly.
• The use of table of random numbers – There are several random number tables. They are
Tippet’s random number table, Fisher’s and Yate’s tables, Kendall and Babington Smiths
random tables, Rand Corporation random numbers etc. The table depicts the specimen of
random numbers by Tippett’s.
2. Principle of inertia of large numbers: This principle states that “other things being equal,
as the sample size increases, the results tend to be more reliable and accurate”. Suppose that
the population mean is 25 units, if a sample size of 50 results in average of 24.5 units, then
larger sample size of 100 will result in 24.8 units. In other words, larger the sample size, more
accurate will be the result.
6. Explain about the sampling distributions of a static and its standard error.
Ans.6
Sampling Distribution of a Statistic: When we collect a sample from a population and
calculate a statistic (e.g., mean, proportion, standard deviation) based on that sample, the value
we obtain is just one possible realization of that statistic. If we were to repeat this process with
multiple samples from the same population and calculate the statistic for each sample, we
would create a distribution of those statistics. This distribution is called the sampling
distribution of the statistic.
For example, if we are interested in the sample mean (𝑥̅), the sampling distribution of the
sample mean would show all possible values of 𝑥̅that could be obtained from different samples
of the same size.
Standard Error of a Statistic: The standard error (SE) of a statistic measures the variability
of the sampling distribution of that statistic. It provides an estimate of how much the sample
statistic is expected to vary from the true population parameter.
For the sample mean (𝑥̅), the standard error (often denoted as SE ( 𝑥̅) or σ𝜎𝑥̅) is calculated as
the standard deviation of the population divided by the square root of the sample size:
SE (𝑥̅) = 𝜎
√𝑛
Where:
• σ is the population standard deviation.
• n is the sample size.
This formula indicates that larger sample sizes result in smaller standard errors, meaning that
the sample mean is more likely to be close to the population mean.
Ans.7 Let's use the given distribution of employees in three manufacturing plants and draw a
random sample of size 15 using random numbers.
Distribution:
• Plant 0-5: 4 employees
• Plant 5-10: 6 employees
• Plant 10-15: 10 employees
So, based on the random numbers generated, you would select 3 employees from the 0-5
category, 6 employees from the 5-10 category, and 6 employees from the 10-15 category,
resulting in a total random sample of size15.
8. Population proportion of tea drinkers is 0.6. Determine the sample size such that the
error between actual and observed proportion will be less than or equal to 0.05 with 95
% confidence, (Z = 1.96).
Ans.8
Given:
• Z=1.96 (for a 95% confidence level)
• p = 0.6 (population proportion)
• E=0.05 (margin of error)
𝑧2 × 𝑝 × (1 − 𝑝)
𝑛=
𝐸2
(1.96)2 × 0.6 × (1 − 0.6)
𝑛=
(0.05)2
UNIT8
1. XYZ bank is determining the number of tellers available during the Friday lunch rush
hour. The bank has collected data on the number of people who entered the bank during
the past three months, on Fridays from 11 am to 1 pm. Using the data from table 8.6,
find the point estimates of the mean and standard deviation of the population from which
the sample was drawn.
Table 8.6: Data of the Number of People entered into XYZ Bank
242 275 289 306 342 385
279 245 269 305 294 328
Ans.1
∑𝑥𝑖 242+275+289+306+342+385+279+245+269+305+294+328
MEAN (𝑥̅) = = Mean =
𝑛 12
296.58
s2 = 1109.873
Standard deviation(s) = √1109.873 = 40.751
2. From a population known to have standard deviation of 1.4, a sample of 60 individuals
is taken. The mean of this sample is found to be 6.2.
i) Find the standard error of the mean.
ii) Establish an interval estimate around the sample mean using one standard deviation
of the mean.
Ans.2
Given:
- Population standard deviation (𝜎) = 1.4
- Sample size (n) = 60
- Sample mean (𝑥̅) = 6.2
i) Standard Error of the Mean (SEM):
1.4
SEM = 𝜎 = 1.4 = = 0.181
√𝑛 √60 7.7
ii. For a 99% confidence level, the Z-score (Z) is approximately 2.576.
13.7
Confidence Interval99% = 112.4 ± 2.576 ( ) = 112.4 ± 2.234
√250
99% Confidence Interval: (110.166,114.634)
UNIT 9
1. Twenty households out of 1000 were using Brand ‘A’ toothpaste. The company
increased the price of the brand. In a survey, they found that only 12 households out of
1000 are using it now. Can we conclude at 5% level of significance that proportion of
users has decreased?
Ans.1.
Given:
- p1 = 20 (proportion before),
1000
- p2 = 12
(proportion after),
1000
- n1 = n2 = 1000 (sample sizes),
- Significance level 𝛼 = 0.05
z-test statistic:
𝑝 −𝑝
𝑧= 1 2
𝑆𝐸
0.02−0.012 0.008
𝑧= =
0.0056 0.0056
𝑧 ≈ 1.428
2. A drill drills holes with standard deviation of depth 0.03cms. It is adjusted to drill holes
of depth 5.5cm. For 50 holes drilled, the mean depth is 5.503cm. Test at 5% level of
significance whether the adjustment is correct.
Ans.2
Given:
- Population standard deviation (𝜎) = 0.03 cm,
- Sample mean (x̅) = 5.503 cm,
- Sample size (n) = 50,
- Hypothesized population mean (𝜇) = 5.5 cm,
- Significance level 𝛼 = 0.05.
z-test statistic:
[𝑥̅ − 𝜇] [5.503 − 5.5] 0.003
𝑍= 𝜎 = = = 0.7075
0.03 0.00424
√𝑛 √50
Compare with the critical value:
The critical value for a one-tailed test at a 5% level of significance is approximately 1.96.
Since 0.7075 < 1.96 we do not reject the null hypothesis at a 5% significance level, h0 accepted.
𝑥1+𝑥2
𝑝=
𝑛1+𝑛2
3+2 5
𝑝= = = 0.0238
80+130 210
𝑆𝐸 = √0. 00016238
𝑆𝐸 ≈ 0.01274
z-test statistic:
𝑝 −𝑝
𝑧= 1 2
𝑆𝐸
0.0375−0.0154 0.0221
𝑧= =
0.01274 0.01274
𝑧 ≈ 1.731
Plant A Plant B
Size 300 200
Mean 75.4 74.3
Ans.4 Given : Variance 65.6 57.8
• NA = 300
• NB = 200
• 𝑋̅A = 75.4
• 𝑋̅B = 74.3
Standard Deviation = √𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒
∴ 𝜎A2 = 65.6; 𝜎 B2 = 57.8
z-test statistic: -
5. A machine is set to produce particular characteristics with mean 21.3 and S.D 0.4. A
random sample of 625 observations has 21.33 as mean. Test whether the sample mean
differ significantly from population mean.
Ans.5 Given:
- Population standard deviation (𝜎) = 0.4 cm,
- Sample mean (x̅) = 21.33 cm,
- Sample size (n) = 625,
- Hypothesized population mean (𝜇) = 21.3 cm,
- Significance level 𝛼 = 0.05.
[𝑥̅−𝜇] [21.33−31.3] 0.75
z-test statistic: 𝑍 = 𝜎 = 0.4 = = 1.875
0.4
√𝑛 √625
Compare with the critical value:
The critical value for a one-tailed test at a 5% level of significance is approximately 1.96.
Since 1.875 < 1.96 we do not reject the null hypothesis at a 5% significance level, h0 accepted.
6. Out 10,000 pumpkins harvested, 1000 were randomly selected. 8% were found to be
rotten. The grower claims that only 7% are rotten. In this claim tenable? Test at 5%
level of significance.
Ans.6 Given:
• p1 = 0.08 (proportion of rotten pumpkins in the sample),
• p2 = 0.07 (claimed proportion by the grower),
• n =1000 (sample size)
z-test statistic:
p1 − p2 0.08 − 0.07 0.01 0.01 0.01
z= = = = = = 1.238
p (1 − p ) 0.07(1 − 0.07) 0.0651 √0.0000651 0.00807
√ 2 2 √ √
n 1000 1000
The critical value for a one-tailed test at a 5% level of significance is approximately 1.96.
Since 1.238 < 1.96 we do not reject the null hypothesis at a 5% significance level, h0 accepted.
7. A group of seven–week–old chickens reared on a high protein diet weigh 12, 15, 11, 16,
14, 14 and 16 ounces. In another group, 5 chicken received low protein diet and weigh 8,
10, 14, 10, and 13. Test whether there is significant increase in weight due to high protein,
use 5% level of significance.
Ans.7
• 𝑋 ̅1 = 14 (Calculate mean)
• s1 = 1.92 (calculate standard deviation)
• n1=7
• 𝑋 ̅2 =11
• s2 = 2.45
• n2=5
t-test :
8. Table depicts the strength test results of two yarns. Is there a significant difference in
the mean? Test at 5% level of significance.
Table 9.9: Strength Results of the Two Yarns
Sample Size Mean Sample Variance
Type A 4 52 42
Ans.8
Type B 9 42 56
Given data:
• For Type A:
• n1 = 4
• 𝑋̅1 = 52
• s12 = 42
• For Type B:
• n2 = 9
• 𝑋 ̅2 = 42
• s22 = 56
• t-test:
The critical t-value for degree of freedom 11 is approximately ±2.201. Since 2.44 is greater
than 2.228, we reject the null hypothesis.
9. The table 9.10 depicts the results related to the memory capacity of 10 students before
and after training. Test at 5% level of significance whether training is effective.
Table 9.10: Memory Capacity of 10 Students
Roll no. 1 2 3 4 5 6 7 8 9 1
Before Training 1 14 11 8 7 1 3 0 5 6
After Training 1 16 10 7 5 1 10 2 3 8
Ans.9
Solution:
n = 10, n - 1 = 9
̅ = Σ ⅆⅈ =
Mean Difference = D
7
= 0.7
n 10
2
∑ⅆⅈ −(Σⅆi) 2 71 −(7)2 22
Sd = √ = √ = √ = √2.44 = 1.54
n−1 9 9
̅
D
t-test: t = =
0.7
=
0.7×√10
=
2.21
= 1.43
Sd 1.54 1⋅54 1.54
√n √10
The critical t-value for degree of freedom 9 is approximately ±2.262. Since 1.43 is less than
2.262, we accept the null hypothesis.
UNIT 10
1.400 items of each (material) were given treatment ‘x’ and ‘y’ to enhance the strength
of the material. 80 gained strength by treatment ‘x’ and 20 gained strength by treatment
‘y’. Does the gain in strength depend on the treatment?
Ans.1
Given Data:
• Total items: 400
• Items gaining strength by treatment 'x': 80
• Items gaining strength by treatment 'y': 20
Observation Table:
Treatment ‘x’ Treatment ‘y’ Total
Gained Strength 80 20 100
Not Gained Strength 320 380 700
Total 400 400 800
Chi-Square Statistic:
(𝑂𝑖 − 𝐸𝑖)2
𝜒2 = ∑
𝐸𝑖
The critical chi-square value for degree of freedom 1 is approximately 3.841. Since χ2 (41.14)
is greater than the critical value (3.841), we reject the null hypothesis.
2. The demand for a particular spare part was found to vary from day to day. Table 10.6
depicts the information obtained in a sample study. Test the hypothesis that the number
demanded depends upon the day.
Table 10.6: Spare Part Demand from Monday to Saturday
Days Mon Tue Wed Thur Fri Sat
Quantity Demanded 1124 1125 1110 1120 1126 1115
Ans.2 Given Data: Table 10.6
Chi-Square Statistic:
(𝑂𝑖 − 𝐸𝑖)2
𝜒2 = ∑
𝐸𝑖
The critical chi-square value for degree of freedom 5 is approximately 11.070. Since χ2 (0.265)
is less than the critical value (11.070), we accept the null hypothesis.
3. In a survey of 200 boys, of which 75 were intelligent, 40 had skilled fathers. While 85
of the unintelligent boys had unskilled fathers. Can we say on the basis of the information
that skilled fathers had intelligent boys?
Ans.3 Given:
Observed Frequency:
Skilled father Unskilled father Total
Intelligent Boys 40 75 – 40 = 35 75
Unintelligent boys 125 - 85 = 40 85 200 – 75 = 125
Total 40 + 40 = 80 35 + 85 = 120 200
Degrees of Freedom:
df = (Number of Rows−1) × (Number of Columns−1) = (2−1) × (2−1) =1
The critical chi-square value for degree of freedom 1 is approximately 3.841. Since χ2 (8.888)
is greater than the critical value (3.841), we reject the null hypothesis.
4. The number of car accidents per month in a town was as follows: 6, 9, 4, 12, 8, 20, 14,
15, 2, and 10. Test the hypothesis that the number of accidents is same every month.
Ans.4
Given the total number of accidents over the 10 months are: 6 + 9 + 4 + 12 + 8 + 20 + 14 + 15
+ 2 + 10 = 100.
Under the null hypothesis, these accidents should be uniformly distributed over the 10 months
period and hence the expected number of accidents for each of the 10 months are 100/10 = 10.
Months Observed No. of Expected No. of (fo - fe) (fo - fe)2 (𝒇𝟎 − 𝒇𝑒)𝟐
𝝌𝟐 =
accidents (fo) accidents (fe) 𝒇𝑒
1 6 10 -4 16 1.6
2 9 10 -1 1 0.1
3 4 10 -6 36 3.6
4 12 10 2 4 0.4
5 8 10 -2 4 0.4
6 20 10 10 100 10.0
7 14 10 4 16 1.6
8 15 10 5 25 2.5
9 2 10 -8 64 6.4
10 10 10 0 0 0.0
Total 100 100 26.6
(𝒇𝟎 − 𝒇𝑒)𝟐
𝝌𝟐 =∑ = 𝟐𝟔. 𝟔
𝒇𝑒
Degrees of Freedom:
df = (Number of Categories −1) = 10 - 1 = 9
The critical chi-square value for degree of freedom 9 is approximately 21.666. Since χ2 (26.6)
is greater than the critical value (21.666), we reject the null hypothesis.
5. In a particular industry the post graduate, graduate, undergraduates are in the ratio
2:3:5. A firm belonging to the industry had 400, 550 and 1050 postgraduates, graduates
and undergraduates on its pay-roll. Do they follow earlier observation about the
industry?
Ans.5 Given Data:
• Observed frequencies in the firm: 400 postgraduates, 550 graduates, 1050
undergraduates
• Expected ratio in the industry: 2:3:5
Expected Frequencies in the Firm (Based on Industry Ratio):
• Total observed count in the firm: 400+550+1050=2000
2
• Expected count for postgraduates: × 2000= 400
3 10
• Expected count for graduates: × 2000 = 600
10
5
• Expected count for undergraduates: × 2000 = 1000
10
(𝑂𝑖 − 𝐸𝑖)2
𝜒2 = ∑
𝐸𝑖
The critical chi-square value for degree of freedom 2 is approximately 5.991. Since χ2 (6.667)
is greater than the critical value (5.991), we reject the null hypothesis.
6. Three hundred digits were chosen at random from a set of tables. The frequencies of
the digits were as follows:
Digits 0 1 2 3 4 5 6 7 8 9
Frequency 28 29 33 31 26 35 32 30 31 25
Using Chi-square test assess the hypothesis that the digits were distributed in equal
numbers in the table.
Ans.6 Given the total number of accidents over the 10 months are: 6 + 9 + 4 + 12 + 8 + 20 +
14 + 15 + 2 + 10 = 100.
Under the null hypothesis, the expected number of frequency for each of the 10 digits are
300/10 = 30.
(𝒇𝟎 − 𝒇𝑒)𝟐
𝝌𝟐 =∑ = 𝟐. 𝟖𝟒
𝒇𝑒
Degrees of Freedom:
df = (Number of Categories −1) = 10 - 1 = 9
The critical chi-square value for degree of freedom 9 is approximately 21.666. Since χ2 (2.84)
is less than the critical value (21.666), we accept the null hypothesis.
UNIT 11
1. Table 11.8 depicts the data of the number of claims processed per day of a group of
four employees of XYZ Insurance Company observed for a number of days. Test the
hypothesis that the employees mean claims per day are all the same. Use 5% level of
significance.
Table 11.8: Claims Processed per Day of Four Employees of an XYZ
Insurance Company
Employee 1 15 17 14 12
Employee 2 12 10 13 17
Employee 3 11 14 13 15 12
Employee 4 13 12 12 14 10 9
Ans.1 Given Data:
Claims processed per day for four employees of XYZ Insurance Company.
Employee 1: 15 17 14 12
Employee 2: 12 10 13 17
Employee 3: 11 14 13 15 12
Employee 4: 13 12 12 14 10 9
Group Mean:
Mean (Employee 1) = (15 + 17 + 14 + 12) / 4 = 14.5
Mean (Employee 2) = (12 + 10 + 13 + 17) / 4 = 13
Mean (Employee 3) = (11 + 14 + 13 + 15 + 12) / 5 = 13
Mean (Employee 4) = (13 + 12 + 12 + 14 + 10 + 9) / 6 = 12.5
F-statistic:
F = MS_Between / MS_Within = 3 / 4.4833 ≈ 0.6696
2. Four makes of bulbs were tested for their length of life (in ‘000 hours) and the data
obtained is depicted in table 11.9. Test whether the length of their life is significantly
different.
Table 11.9: Four Different Makes of Bulbs with Their Length of Life
MAKE I MAKE II MAKE III MAKE IV
20 19 21 15
23 15 19 17
18 17 20 16
17 20 17 18
16 16
Ans.2
T = Sum of Observations
2
= 324
2
Correction Factor = T = 324 = 5832
N 18
𝟐
SST (Total Sum of the Squares) = Sum of squares of all observations - 𝐓 = (202 + 232 +
𝐍
182 + 172 + ……. 162 + 182) – 5832 = 5914 – 5832 = 82
2
(𝛴𝑋1)2 (𝛴𝑋2)2 (𝛴𝑋3)2 (𝛴𝑋4)2 782 872 932 + 66 – 5832
SSC = [ + + + ] − 5832 = + +
𝑛1 𝑛2 𝑛3 𝑛4 4 5 5 4
SSC = (1521+1513.8+1729.8+1089) - 5832 = 21.6
F-statistic:
𝑀𝑆𝐶 7.2
FCal = = = 1.67
𝑀𝑆𝐸 4.31
The table value of ‘F,’ at 5% level of significance for (3,14) degrees of freedom (df), is 3.34.
Since 1.67<3.34. we fail to reject the null hypothesis. Therefore, it is not significant.
3. Table 11.10 depicts the data on production rate by five workmen on four machines.
Test whether the rate is significantly different due to workers and machines.
Table 11.10: Production Rate of Five Workmen on Four Machines
Workmen
Machines
I II III IV V
1 46 48 36 35 40
2 40 42 38 40 44
3 49 54 46 48 51
4 38 45 34 35 41
Ans. N = 20, T = Sum of all values = 850
2 2
Correction Factor = T = 50 =0
N 20 𝟐
SST (Total Sum of the Squares) = Sum of squares of all observations - 𝐓 = (462 + 402 +
𝐍
492 + (38)2 + ……. 512 + 412) – 36125 = 629
SSC (between workmen): 2
(∑𝑋1𝑖)2 (∑𝑋2𝑖)2 (∑𝑥𝑛𝑖)2 𝑇2 1732 1892 1542 + 176 – 0 = 201.5
SSC = [ + + ⋯+ ]− = + +
𝑛1 𝑛2 𝑛𝑛 𝑛 4 4 4 4
2
(∑𝑋1𝑗)2 (∑𝑋2𝑗)2 (∑𝑥𝑛𝑗)2 𝑇2 2052 2042 2482 + 193 – 36125 = 353.8
SSR = [ + + ⋯+ ]− = + +
𝑛1 𝑛2 𝑛𝑛 𝑛 5 5 5 5
𝑀𝑆𝐶
Since MSC > MSE we take 𝐹 = and MSR > MSE we take 𝐹 = 𝑀𝑆𝑅
𝑐 𝑀𝑆𝐸 𝑟 𝑀𝑆𝐸
𝑀𝑆𝐶 50.12
𝐹𝑐 = = = 8.08
𝑀𝑆𝐸 6.22
𝑀𝑆𝑅 117.9
𝐹𝑟 = = = 19.01
𝑀𝑆𝐸 6.22
For Workmen:
The calculated value of 𝐹𝑐 is 8.08. The table value of F for (4,12) df at 5% level of significance
is 3.26. Since the calculated value of F is greater than the table value, we rejectt the null
hypothesis and conclude that it is significant.
For Machine:
The calculated value of 𝐹𝑟 is 19.01. The table value of F for (3,12) df at 5% level of significance
is 3.49.Since the calculated value of F is greater than the table value, we rejectt the null
hypothesis and conclude that it is significant.
4. The percentage of sugar content of tobacco in two samples is depicted in table 11.11.
Test whether their population variances are same.
Table 11.11: Percentage of Sugar Content of Tobacco in Two Samples
Sample A 2.4 2.7 2.6 2.1 2.5
Sample B 2.7 3.0 2.8 3.1 2.2 3.6
Ans.4
Sample A:
2.4 + 2.7 + 2.6 + 2.1 + 2.5
𝑥1 = = 2.46
5
1
𝑆12 = 𝛴(𝑥𝑖 − 𝑥̅)2
4 1
= [ (2.4 − 2.46)2 + (2.7 − 2.46)2 + (2.6 − 2.46)2 + (2.1 − 2.46)2
4
+ (2.5 − 2.46)2 ] = 0.0525
Sample B:
2.7 + 3 + 2.8 + 3.1 + 2.2 + 3.6
𝑦̅1 = = 2.9
6
1 2
2
𝑆1 (𝑦 − 𝑦̅)
= 𝛴 𝑖
4 1
= [ (2.7 − 2.9)2 + (3 − 2.9)2 + (2.8 − 2.9)2 + (3.1 − 2.9)2
5
+ (2.2 − 2.9)2 + (3.6 − 2.9)2 ] = 0.216
F-statistic:
𝑠12 0.0525
= ≈ 0.24
𝑠22 0.216
Degrees of Freedom:
df1 = 4-1 = 3
df2 = 6-1 = 5
The table value of ‘F,’ at 5% level of significance for (3,5) degrees of freedom (df), is 6.607.
Since 0.24<6.607. we fail to reject the null hypothesis. Therefore, it is not significant.
5. Three students determine the moisture content of samples of a powder, each student
taking a sample from each of 4 consignments. The results are given below:
Consignment
Students
I II III IV
1 9 10 9 10
2 12 12 10 11
3 11 11 9 12
𝑀𝑆𝐶 1.88
𝐹𝑐 = = = 4.0
𝑀𝑆𝐸 0.47
𝑀𝑆𝑅 3.25
𝐹𝑟 = = = 6.91
𝑀𝑆𝐸 0.47
For Consignment:
The calculated value of 𝐹𝑐 is 4. The table value of F for (3,6) df at 5% level of significance is
4.76. Since the calculated value of F is less than the table value, we accept the null hypothesis
and conclude that it is not significant.
For Student:
The calculated value of 𝐹𝑟 is 6.91. The table value of F for (2,6) df at 5% level of significance
is 5.14. Since the calculated value of F is greater than the table value, we accept the null
hypothesis and conclude that it is significant.
UNIT 12
1. Table 12.11 depicts the marks obtained by 10 students in commerce and statistics.
Calculate the rank correlation.
Marks in Statistics 35 90 70 40 95 45 60 85 80 50
Marks in Commerce 45 70 65 30 90 40 50 75 85 60
Ans.1
Marks in Rank 1 Marks in Rank 2 D = R1 - R2 D2
Statistics Commerce
35 10 45 8 2 4
90 2 70 4 -2 4
70 5 65 5 0 0
40 9 30 10 -1 1
95 1 90 1 0 0
45 8 40 9 -1 1
60 6 50 7 -1 1
85 3 75 3 0 0
80 4 85 2 2 4
50 7 60 6 1 1
∑D = 0 ∑ D2 = 16
6ΣD2 6(16) 96
r= 1− =1− = = 1 − 0.0969 = 0.9031
N(N2 − 1) 10(102 − 1) 990
2. Calculate Spearman’s rank correlation coefficient between the series A and B depicted
in table 12.12.
Table 12.12: Series Data of Terminal Question 2
Series 1 57 59 62 63 64 65 55 58 57
Series 2 113 117 126 126 130 129 111 116 112
Ans.2
Series 1 Rank 1 Series 2 Rank 2 D = R1 - R2 D2
57 7.5 113 7 .5 0.25
59 5 117 5 0 0
62 4 126 3.5 .5 0.25
63 3 126 3.5 -.5 0.25
64 2 130 1 1 1
65 1 129 2 -1 1
55 9 111 9 0 0
58 6 116 6 0 0
57 7.5 112 8 -.5 0.25
∑D = 0 ∑ D2 = 1
Here number 57 is repeated twice in series 1 and number 126 is repeated twice in series 2.
Therefore, in 1, m = 2 and in 2, m = 2.
1 1
6 (∑𝐷2 + (𝑚3 − 𝑚 ) + (𝑚3 − 𝑚 ))
12 1 1 12 2 2
𝑅 =1−
𝑁3 − 𝑁
1 1
6 (1 + (8 − 2) + (8 − 2))
12 12 6(2)
𝑅 =1− =1− = 1 − 0.016 = 0.984
93 − 9 729
3. For the data in table 12.13, obtain the two lines of regression and its estimation of the
blood pressure when age is 50 yrs.
Age in years(X) 56 42 72 39 63 47 52 49 40 42 68 60
BP (Y) 127 112 140 118 129 116 130 125 115 120 135 133
Ans.3
4. Table 12.14 depicts the results that were worked out from scores in statistics and
mathematics in a certain examination.
Table 12.14: Results of Scores in Statistics and Mathematics Examination
∴ Regression line of Y on X is
Y - ̅Y = byx (X -̅X )
Y – 47.5 = 0.692(X – 39.5)
Y = 0.692X + 20.17
When X=50
Y = 54.67
∴ Regression coefficient of X on Y
bxy = r. 𝜎𝑥 = 0.42. 10.8 = 0.25
𝜎y 17.8
∴ Regression line of Y on X is
X - ̅X = byx (Y -̅Y )
X – 39.5 = 0.25(Y – 47.5)
X = 0.25Y + 27.39
When Y=30
X = 34.89
UNIT 13
1. What is business forecasting?
Ans.1 Business forecasting refers to the analysis of past and present economic conditions with
the object of drawing inferences about probable future business conditions. The process of
making definite estimates of future course of events is referred to as forecasting and the figure
or statements obtained from the process is known as ‘forecast;’ future course of events is
rarely known. To be assured of the coming course of events, an organised system of forecasting
helps. The following are two aspects of scientific business forecasting:
1. Strategic Planning:
Forecasting helps in long-term strategic planning by providing insights into potential
opportunities and threats. It allows businesses to align their strategies with anticipated market
conditions.
2. Resource Allocation:
By forecasting demand for products or services, businesses can allocate resources such as
manpower, raw materials, and capital more efficiently. This prevents shortages or excesses,
optimizing operational efficiency.
3. Financial Planning:
Forecasting assists in financial planning by predicting future sales, revenues, and expenses.
This information is crucial for budgeting, setting financial goals, and ensuring the availability
of funds when needed.
4. Risk Management:
Businesses face various risks, including economic fluctuations, market changes, and
unexpected events. Forecasting helps identify potential risks, allowing organizations to
develop risk mitigation strategies and contingency plans.
5. Inventory Management:
Forecasting demand helps in managing inventory levels effectively. By avoiding overstocking
or stockouts, businesses can reduce holding costs, improve cash flow, and enhance customer
satisfaction.
2. Based on Historical Data: Business forecasting relies on historical data and trends to
identify patterns and make predictions about future events. Analysing past performance
provides a foundation for understanding and anticipating future behaviour.
3. Quantitative and Qualitative: Forecasting utilizes both quantitative and qualitative data.
Quantitative methods involve numerical data and statistical analysis, while qualitative
methods consider non-numeric factors such as market trends, customer preferences, and expert
opinions.
6. Subject to Uncertainty: Despite using historical data and advanced modeling techniques,
forecasting is inherently uncertain. Various unforeseen events, such as natural disasters,
geopolitical changes, or unexpected market shifts, can influence outcomes.
7. Multiple Methods: There are various methods and techniques for business forecasting,
each suited to different situations and data types. Common methods include time series
analysis, regression analysis, market research, and expert judgment. A combination of
methods may be used for more accurate predictions.
10. Communication Tool: Forecasts serve as a communication tool within the organization.
They are shared with stakeholders, including executives, managers, investors, and employees,
to provide a common understanding of future expectations and goals.
12. Flexibility and Adaptability: Business forecasting must be flexible and adaptable to
changing circumstances. Organizations should be prepared to adjust their forecasts and
strategies as new information emerges or as market conditions evolve.
2. Time Series Analysis:Time series analysis involves studying historical data to identify
patterns and trends over time. It includes methods like moving averages, autoregressive
integrated moving average (ARIMA) models, and seasonal decomposition. This method is
particularly useful for forecasting based on past observations.
4. Regression Analysis: Regression analysis explores the relationship between dependent and
independent variables. It is a statistical method used to model and predict the impact of one or
more factors on a target variable. Regression models can be simple linear or multiple,
depending on the number of predictors.
Each of these methods has its strengths and weaknesses, and the choice of method depends on
factors such as the nature of the data, the forecasting horizon, and the specific requirements of
the business or industry. Often, a combination of methods is used to provide a more robust and
accurate forecast.
2. Action and reaction theory: Action and reaction theory posits that economic events are
interconnected, with one event triggering a series of reactions. For instance, a government
policy change or a shift in consumer behavior may lead to a chain reaction of events in the
economy. This theory emphasizes the importance of identifying causal relationships between
economic variables.
3. Economic rhythm theory: Economic rhythm theory suggests that economic activities
exhibit rhythmic patterns or cycles. These cycles, often characterized as boom and bust phases,
are thought to follow a regular and predictable rhythm. Understanding these economic rhythms
assists in forecasting future trends and adjusting business strategies accordingly.
4. Specific historical analogy: The specific historical analogy theory involves drawing
comparisons between current economic conditions and past historical events. By identifying
similarities between the present situation and a specific historical period, forecasters can make
predictions based on the outcomes of similar circumstances in the past.
5. Cross-cut analysis theory: Cross-cut analysis theory involves examining various economic
indicators and factors simultaneously to make forecasts. Instead of focusing on one variable,
this theory emphasizes the importance of considering multiple factors that may impact each
other. The interconnectedness of different elements is crucial for a more comprehensive and
accurate forecast.
UNIT 14
1. What is meant by analysis of time series?
Ans.1 Time series analysis is a statistical method used to analyze and interpret data points
collected over time. In this analysis, data is ordered chronologically, allowing for the
examination of patterns, trends, and variations within the dataset. The primary goal of time
series analysis is to uncover meaningful insights, make predictions, or model the underlying
structure of the time-dependent data.
2. State the difference between seasonal variations and cyclical fluctuations.
Ans.2
Characteristic Seasonal Variations Cyclical Fluctuations
Nature Regular and predictable patterns Longer-term, repetitive patterns
Repeats at fixed intervals (days, weeks, No fixed interval, variable
Frequency months, seasons) frequency
Duration Short-term, limited duration Longer-term, extended duration
Influenced by external factors like Influenced by broader economic
Cause weather, holidays, or cultural events. factors and business cycles.
ii. Semi Averages Method: The semi averages method involves computing averages for
overlapping subsets of data points. This helps smooth out short-term fluctuations, making it
easier to identify the underlying trend. It is a less subjective method compared to free hand or
graphic methods.
iii. Moving Average Method: Moving averages involve calculating the average of a set
number of adjacent data points at each time period. This method helps filter out short-term
fluctuations, making the underlying trend more apparent. Common types include simple
moving averages (SMA) and weighted moving averages (WMA).
iv. Method of Least Squares: The method of least squares is a statistical approach that
minimizes the sum of the squared differences between the observed and predicted values of
the dependent variable. When applied to time series analysis, it helps identify the line that best
fits the overall trend in the data. Linear regression is an example of the method of least squares.
Long-term trend or secular trend: This component represents the underlying trend in the
data over an extended period. It indicates the overall direction in which the time series is
moving, ignoring short-term fluctuations.
Seasonal variations: Seasonal variations occur when a time series exhibits regular patterns or
cycles at specific intervals, such as daily, weekly, monthly, or yearly. These patterns are often
influenced by external factors like weather, holidays, or cultural events.
Cyclic variations: Cyclic variations are longer-term patterns that do not have a fixed duration.
Unlike seasonal variations, cyclic patterns are not as regular and might span multiple years.
Economic cycles, for example, can be considered cyclic variations.
Random variations (or residuals): Random variations, also known as residuals, represent
the irregular or unpredictable fluctuations in the time series that cannot be attributed to the
long-term trend, seasonal patterns, or cyclic variations. These variations are often caused by
random events and noise in the data.
Smoothing Trends: Moving averages are effective in smoothing out short-term fluctuations
or noise in time series data. They provide a clearer view of the underlying trends by averaging
out random variations, making it easier to identify the long-term movement.
Highlighting Patterns: Moving averages help in highlighting patterns and trends in the data,
making it easier for analysts to observe and interpret the direction in which the time series is
moving. This is especially useful for detecting cycles and identifying potential turning points.
Forecasting: Moving averages are used for forecasting future values in a time series. By
calculating moving averages over specific intervals, analysts can make predictions about the
next data point. This is particularly useful in identifying trends and making short-term
predictions.
Noise Reduction: They are effective in reducing the impact of random fluctuations or outliers,
providing a clearer picture of the overall behavior of the time series. This makes it easier to
discern meaningful patterns without being overly influenced by short-term irregularities.
Lagging Indicator: Moving averages introduce a lag in the data because they are based on
past observations. This lag may cause delayed reactions to changes in the underlying patterns,
making it less effective for real-time analysis or forecasting.
Sensitivity to Window Size: The choice of the window size (the number of data points
included in the average) can significantly impact the results. Smaller windows provide more
responsiveness to recent changes but may be sensitive to noise, while larger windows offer
smoother trends but might overlook short-term variations.
Not Suitable for Irregular Data: Moving averages may not perform well with irregular time
series data that do not exhibit consistent patterns. In such cases, the averaging process may
distort the true nature of the data.
Inability to Capture Sudden Changes: Sudden and unexpected changes in the time series,
such as abrupt shifts or outliers, can be challenging for moving averages to capture. They may
take time to adjust to significant changes in the data.
Assumption of Stationarity: Moving averages assume stationarity, meaning that the
statistical properties of the time series remain constant over time. If the time series exhibits
non-stationary behaviour, the moving average may not provide accurate insights.
6. What is meant by secular trend? Discuss any two methods of isolating trend values in
a time series.
Ans.6 Secular Trend:
A secular trend, also known as a long-term trend, refers to the underlying, persistent movement
or direction in a time series over an extended period. It represents the gradual, sustained
increase or decrease in the values of a variable. Secular trends are usually observed over years
or decades and can be influenced by factors such as economic growth, technological
advancements, population changes, or other fundamental shifts in the environment.
Moving Averages: Moving averages are a common method used to isolate the trend
component in a time series. This technique involves calculating the average of a specified
number of adjacent data points, effectively smoothing out short-term fluctuations and
highlighting the underlying trend. The choice of the window size is crucial, as smaller
windows provide more responsiveness to recent changes but may be sensitive to noise, while
larger windows offer smoother trends but might overlook short-term variations.
7. What is seasonal variation of a time series? Describe the various methods you know to
evaluate it and examine their relative merits.
Ans.7 Seasonal variation in a time series refers to the predictable and recurring patterns or
fluctuations that occur at regular intervals within a specific time period. These patterns often
correspond to calendar months, quarters, or other regular intervals. Understanding and
analysing seasonal variation is crucial in time series analysis, as it allows for better forecasting
and trend identification.
The main methods of measuring seasonal variations are:
1. Simple Average Method: The simple average method is a straightforward approach to
evaluating time series data. It involves calculating the average value of a variable over a
specified time period. The merit of the simple average method lies in its simplicity and ease
of application. It provides a quick overview of the central tendency of the data, making it
accessible for initial assessments. However, its limitation is that it might not capture variations
or trends effectively, especially if the data has significant fluctuations.
2. Ratio to Moving Averages Method: The ratio to moving averages method involves
dividing the value of a variable at a specific time point by the moving average of that variable
over a certain period. This method helps to smooth out short-term fluctuations and highlight
underlying trends. Its merit lies in its ability to provide a clearer picture of the relative
performance of the variable, reducing noise and emphasizing the more persistent patterns.
However, the choice of the moving average window size is crucial, and this method may lag
behind sudden changes in the data.
3. Chain or Link Relative Method: The chain or link relative method involves comparing
the current period's value with the previous period's value. This method is particularly useful
for identifying the rate of growth or decline between consecutive time points. Its merit lies in
its ability to capture the sequential relationship and highlight the direction and magnitude of
changes over time. It is especially effective when analysing data with varying growth rates.
However, it may be sensitive to outliers, and the interpretation of results depends on the base
period chosen.
4. Ratio to Trend Method: The ratio to trend method involves dividing the actual value of a
variable by its trend value at a specific time. This method is beneficial for isolating and
examining the cyclical or trend component of a time series. Its merit lies in its ability to
highlight deviations from the overall trend, providing insights into cyclical patterns. It is
particularly useful when there is a need to distinguish between short-term fluctuations and
long-term trends. However, accurate estimation of the trend component is crucial, and the
method may be sensitive to extreme values in the data.
8. Find a straight-line trend to the following data and find trend value.
Ans.8 The trend line can be fitted by using the method of least squares for the given data.
3. Relative measure: Index number measure changes which are not capable of direct
measurement.
4. Specified averages: Index number represents a special case of average, in general known
as weighted average. It is a special type of average, because in a simple average, the data is
homogenous having the same unit of measurement, whereas the average variables have
different units of measurement.
5. Basis of comparison: Index numbers by their very nature are comparative. They compare
changes over time or between places or similar categories.
4. Construct Fisher’s ideal index for the data depicted in table 15.12
Fisher's Method:
p01 = √
Σp1q0
×
Σp1q1
× 100 = √
7810
×
8475
× 100 = √
66189750 × 100 = 133.66
Σp0q0 ∑p0q1 5830 6457 37644310
5.The table 15.13 depicts the price of commodities along with the weights of respective
commodities. Calculate index number for 2000 based on the year 1995.
Table 15.13: Price of Commodities along with the Weights
Commodity 1995 2000 Weights
A 0.50 0.75 2
B 0.60 0.75 5
C 2.00 2.40 4
D 1.80 2.10 8
E 8.00 10.00 1
Ans.5
Index Number for 2000 = 𝑇𝑜𝑡𝑎𝑙 𝑒𝑥𝑝𝑒𝑛𝑠𝑒 𝑖𝑛 2000 × 100
𝑇𝑜𝑡𝑎𝑙 𝐸𝑥𝑝𝑒𝑛𝑠𝑒 𝑖𝑛 1995
12. The total sale of a product in Area A is 840 for 30 working days. The total sale of the same
product in Area B is 784 for 28 working days. Should Statistics be applied to get an appropriate
picture regarding the comparison of sales?
Ans.12 Yes
UNIT 2
1. What are the main stages in a survey? – Planning and execution
2. Training of investigators belongs to which stage? - Planning
3. Analysis of data is a part of the execution of survey. Is this correct? - Yes
4. Classify the following as finite or infinite population.
i) Production of a product in a factory for a day - Finite
ii) The set of rational numbers - Infinite
iii) The weight of newborn babies measured up to first decimal place in a state during the first
week of February 2008 - Finite
5. Classify the following as an attribute or a variable.
i) Eye colour of human beings - Attribute
ii) Number of pages in a book of various subjects - Variable
6. Classify the following as discrete or continuous variable
i) Number of shares sold each day in a stock market. - Discrete
ii) Temperatures recorded every half hour at a regional meteorological centre. - Continuous
7. Statistics can best be considered as
i) both Art and Science
ii) Art
iii) Science
iv) neither Art nor Science
8. Data that possess numerical properties are known as
i) Quantitative data
ii) Qualitative data
iii) Primary data
iv) Parametric data
9. A tool of all science in research and making an intelligent judgement is
i) Statistics
ii) Collection
iii) Data
iv) Judgement
10. State whether the following data are Primary or Secondary.
i) An official of the Census Board of India is preparing a report on census of population based
on the survey data that is collected by the Census Board. - Primary Data
ii) An HR representative of a software company is deciding on the time taken to perform a
particular job on a project based on random observations collected by him. - Primary Data
iii) A neurologist is examining the relationship between cigarette smoking and brain tumour
based on the data published in a famous neurology journal. – Secondary Data
UNIT 3
1. Classification is a systematic grouping of the units according to their common
characteristics.
2. Classification reduces bulk of the data.
3. Classification of data that are non-measurable is known as Attributes.
4. Classification done according to two attributes or variables is Two-Way Classification.
5. Manifold classification involve more than two variables.
6. Data arranged according to time of occurrence is known as Chronological classification.
7. Geographical classification means classification of data according to:
i) Location
ii) Time
iii) Attributes
iv) Class intervals
8. Classification is a process of arranging the data into:
i) Different columns
ii) Different rows
iii) Different rows and columns
iv) Groups of related facts in different classes
9. The data that can be classified on the basis of time is:
i) Geographical
ii) Chronological
iii) Qualitative
iv) Quantitative
10. State True or False
i. Tabulation presents the data in a minimum space. - TRUE
ii. Tabulation is a process of analysis - FALSE
iii. General purpose table deals with specific objectives. - FALSE
iv. Derived tables deal with total, percentages, ratios, etc - TRUE
11. i) If the data readings are 3, 4, 5, 6, 7, then it is called discrete variable. Height is generally
continuous variable.
ii) There are five derived frequency distributions for any frequency distribution.
iii) Width of class-interval is given by the difference between upper class limit and lower
class limit.
iv) There are two marginal distributions for a distribution.
v) Sturge’s formula is used to calculate the number of class-intervals.
vi) The relative frequency distribution is obtained from frequency distribution by calculating
f/N.
12. i) Diagrams give an accurate value. (True/False)
ii) Pie diagram is drawn according to degree subtended at the centre of a circle. (True/False)
iii) Simple bar diagram is drawn for multiple characteristics. (True/False)
13. The graph plotted in the form of series of rectangles is
i) Frequency
ii) Frequency polygon
iii) Pie
iv) Histogram
14. The diagram which are used to show percentages break down is
i) A circle
ii) A square
iii) A pie diagram
iv) A rectangle
15. A line graph indicates
i) Comparison
ii) Variation
iii) Range
iv) All the above
16. Which of the following is not a type of bar chart?
i) Multiple
ii) Percentages
iii) Subdivided
iv) Ogive
UNIT 4
1. State whether the following questions are ‘True’ or ‘False’.
i. For a given set of values if we add a constant 5 to every value, then the arithmetic mean is
affected. - TRUE
ii. Arithmetic mean can be calculated for distribution with open-end classes. - FALSE
iii. Arithmetic mean is affected by extreme values. - TRUE
iv. Arithmetic mean of 12, 16, 23, 25, 28, 32 is 22. - FALSE
2. A single value within the range of the entire mass of data that is used to represent the whole
data is
i) Measures of Central tendency
ii) Statistics
iii) Measures of Dispersion
iv) Skewness
3. Find the Arithmetic mean 68,41,75,91,53,86,59
i) 67.57
ii) 47.57
iii) 37.57
iv) 27.57
4. The average computed by considering the relative importance of each of values to the total
value, is called
i) Arithmetic mean
ii) Geometric mean
iii) Weighted arithmetic mean
iv) Harmonic average.
5. State whether the following questions are ‘True’ or ‘False’.
i) Mode is based on all values - FALSE
ii) Mode = 3 Median – Mean - FALSE
iii) Geometric mean is used when we are interested in rate of growth of any phenomena -
TRUE
iv) Harmonic mean exists if one of the values is zero. - FALSE
v) A.M < G.M < H.M for any two values ‘a’ and ‘b’. - FALSE
vi) Arithmetic mean can be calculated accurately even when the distribution has open-end
class. - FALSE
vii) Mode can be located graphically. - TRUE
viii) Mode is used when data is on interval scale. - TRUE
6. If the values of the variables are arranged in ascending order of magnitude, the middle term
is
i) mean
ii) mode
iii) median
iv) quartile
7. In a symmetrical distribution the mean, median and mode
i) differ
ii) coincide
iii) mean-median = mode
iv) differ by 0.5
8. The relation between mean, median and mode is given by
i) Mode= 3 Median-2 Mean
ii) Mode=2 Mean-Median
iii) Mode= 3Median –Mean
iv) Mode= Mean- Median
9. The harmonic mean of 30 and 20 is
i) 25
ii) 24
iii) 20
iv) 30
10. If assumed mean A=32.5, i=8, fd =-13 and f= 90
i) mean = 35.31
ii) mean=31.35
iii) mean = 33.15
iv) mean=35.35
11. In any distribution when the original items differ in size, the value of Arithmetic mean
(AM), Geometric mean (GM) and Harmonic mean (HM) would also differ in the following
order
i) AM>GM>HM
ii) AM=GM=HM
iii) AM<HM<GM
iv) AM.GM>HM
12. State whether the following questions are ‘True’ or ‘False’.
i) Quartiles are positional value. - TRUE
ii) Quartiles help us to find percentage of readings below or above a certain value. - TRUE
iii) Q2 = P50 = D7 = Median - FALSE
13. State whether the following questions are ‘True’ or ‘False’.
i) The cost of living index numbers calculated are based on weighted averages. - TRUE
ii) Many of the items which we use in our life can be assigned weights. - TRUE
14. State whether the following questions are ‘True’ or ‘False’
i. Standard deviation is based on all the values. -TRUE
ii. Standard deviation of a set of values is increased if every value of the set is increased by a
constant. - FALSE
iii. Standard deviation can be calculated for distributions with open-end classes. - FALSE
iv. Coefficient of variation can be used to compare the variability of two sets of data measuring
the same characteristics. - TRUE
UNIT 5
1. To which approach does the following probability estimates belong:
i. Probability that India will win the game - Subjective approach
ii. Probability that Mr. Ram will resign from the post - Mathematical approach
iii. Probability of drawing a red card - Subjective approach
iv. Probability that you will go to America this year - Subjective approach
2. Find the probabilities in the following cases:
i. Getting an even number when a die is thrown – 1/2
ii. Getting 53 Mondays in ordinary year – 1/7
3. Given P(A) = 0.6, P(B) = 0.7, and P (A ∩ B) = 0.5. Find P (A U B)?
Ans.3 P(A𝖴 B) = P(A) +P(B) – P(A∩ B) = 0.6 + 0.7 – 0.5 = 0.8
4. State whether the following questions are true or false:
i. Bayes’ probability estimates sample value - FALSE
ii. Conditional probability can incorporate costs - FALSE
iii. Bayes’ probability gives up to date information - TRUE
5. Fill in the blanks:
i. For a random variable ∑ P(Xi) =1.
ii. Expectation of a random variable is same as mean of the probability distribution of that
variable.
iii. Var (X) = E (X2) – [E(X)]2.
UNIT 6
1. State whether the following statements are ‘True’ or ‘False’.
i) The sum of probabilities sometimes will be greater than 1. - FALSE
ii) The amount of time you study for an exam is a discrete random variable. - FALSE
iii) The Bernoulli distribution has only one parameter ‘p.’ - TRUE
2. State whether the following statements are ‘True’ or ‘False’.
i) Mean of binomial distribution is ‘npq.’ - FALSE
ii) ‘n’ and ‘p’ are the parameters of Binomial distribution. - FALSE
iii) If the mean and variance of a Binomial distribution are 6 and 5, then p = 1/6. - TRUE
iv) Each trial in a binomial experiment has the different probability of success ‘p’- FALSE
3. State whether the following statements are ‘True’ or ‘False’
i) ‘X’ is a Poisson variate if p < 0.1 and n > 10. - TRUE
ii) Poisson distribution is a unimodal distribution. - TRUE
4. State whether the following statements are ‘True’ or ‘False’.
i) Quartile deviation of normal distribution is 4/ 5 𝜎. - FALSE
ii) Mean and standard deviation of Standard normal distribution are ‘1’ and ‘0’. - FALSE
iii) Mean, Median and Mode coincide in a normal distribution - TRUE
UNIT 7
1. State whether the following statements are True or False.
i) Population is aggregate of objects under study. - TRUE
ii) Sampling method consume time and resources. - FALSE
iii) Population is a subset of sample. - FALSE
iv) An unbiased sample gives an accurate prediction of characteristics of an entire population.
- TRUE
v) The standard deviation of sampling distribution of a statistic is known as standard error of
that statistic. - TRUE
vi) Standard error is used as a reliability measure. - TRUE
vii) Faulty selection of sample contributes to sampling error. - TRUE
viii) Personal bias increases the non-sampling errors. - TRUE
ix) Unbiased errors are cumulative in nature. - FALSE
2. State whether the following statements are true ‘T’ or false ‘F’. - FALSE
i) Sample in which units are selected by judgment is known as probability sample. - FALSE
ii) Judgment sampling does not give representativeness of a sample. - TRUE
iii) Large sample size always results in minimising the standard error. - TRUE
iv) A sampling plan that divides the population into well-defined groups from which random
samples are drawn is known as cluster sampling. - FALSE
v) The principles of simple random sampling are the theoretical basis for statistical inference.
- TRUE
vi) If the mean of a certain population is 20, it is likely that most of the sample means will be
20. - FALSE
vii) Any sampling distribution can be totally described by its mean and standard deviation. -
FALSE
viii) The central limit theorem assures the sampling distribution of the mean approaches
normal distribution as the sample size increases. - TRUE
ix) Stratified sampling is used when each group considered are more homogenous within itself
and heterogeneous between group. - FALSE
UNIT 8
1. XY Pizza has developed quite a business in Bangalore by delivering pizza orders promptly.
It guarantees that its pizzas will be delivered in 30 minutes or less from the time the order was
placed, and if the delivery is late, the pizza is free. The time that it takes to deliver each pizza
order, that is, the on time is recorded in the pizza time book (PTB), and the delivery time for
those pizzas that are delivered late is recorded as 30 minutes in the PTB. A sample of 12
random entries from the PTB is depicted in table 8.5.
Table 8.5: Twelve Random Entries of Pizza Delivery Time
15.3 29.5 30 10.1 30 19.6 10.8 12.2 14.8 30 22.1 18.3
i) Find the mean for the sample.
ii) From what population was this sample drawn?
iii) Can this sample be used to estimate the average time that it takes for Pizza Hut to deliver
a pizza. Explain.
Ans.1 i) For the given sample the mean is 20.225 minutes.
ii) The population was drawn from the Pizza Time Book (PTB) of XY pizza.
iii) No. As the time over 30 minutes is recorded as 30 and hence, it will underestimate the
delivery time.
2. Madhu, a frugal student, wants to buy a used bike. After randomly selecting 125 wanted
advertisements, he found the average price of the bike to be Rs. 3250 with a standard deviation
of Rs. 615. Establish an interval estimate for the average price of bike so that Madhu can be:
i) 68.3% certain that the population mean lies in this interval.
ii) 95.5% certain that the population mean lies in this interval.
Ans.2 The population standard deviation is given as:
𝜎𝑠 = 615; 𝑛 = 125; 𝑋̅= 3250
and standard error is calculated as:
𝜎𝑆 615
𝜎𝑥 = = = 55.01
√𝑛 √125
̅ ± 𝟏𝝈𝒙̅= 3194.99± 𝟓𝟓. 𝟎𝟏 = 𝟑𝟏𝟗𝟒. 𝟗𝟗 and 3305.01 to be 68.3% certain.
i) 𝑿
ii) 95.5% certain means 𝑿 ̅ ± 𝟐𝝈𝒙̅= 3250 ± 𝟏𝟏𝟎. 𝟎𝟐 giving a range between 3139 and
3360.02
3. Given the following confidence levels, express the lower and upper limits of the confidence
interval for these levels in terms of X and 𝜎 x (Use the normal distribution tables).
i) 54 percent - 𝑿̅ ± 𝟎. 𝟕𝟒𝝈̅𝒙
ii) 75 percent - 𝑿 ̅ ± 𝟏. 𝟏𝟓𝝈𝒙̅
iii) 94 percent - 𝑿 ̅ ± 𝟏. 𝟖𝟖𝝈̅𝒙
iv) 98 percent - 𝑿 ̅ ± 𝟐. 𝟑𝟑𝝈𝒙̅
4. From a population of 540, a sample of 60 individuals is taken. From this sample the mean
is found to be 6.2 and the standard deviation to be 1.368.
i) Find the estimated standard error of the mean.
ii) Construct a 96 % confidence interval of the mean.
Ans.4 𝜎 i. 𝜎
= ×√
𝑁−𝑛
=
1.368
×√
540−60 = 0.167
𝑥 𝑛−1 √60 540−1
𝑛
√
̅ ± 𝟐. 𝟎𝟓𝝈𝒙̅= 𝟔. 𝟐 ± 𝟐. 𝟎𝟓(𝟎. 𝟏𝟔𝟕)
ii. 𝑿
Hence, the LCL and UCL are 5.86 and 6.54 respectively.
5. For the following sample sizes and confidence levels, find the approximate ‘t’ values for
constructing confidence intervals (use the ‘t’ table).
i) n = 28; 95%
ii) n = 8; 98%
iii) n = 13; 90%
iv) n = 25; 95%
Ans.5 To find the approximate 't' values for constructing confidence intervals, we can refer to
the t-distribution table. The degrees of freedom df are crucial for finding the 't' values, and they
are calculated as (df = n - 1), where (n) is the sample size.
i) (n = 28); 95%
- Degrees of freedom = 27
- From the t-distribution table, for a two-tailed test with (df = 27) and a confidence level of
95%, the 't' value is approximately 2.055.
UNIT 9
1. For the following cases: specify which probability distribution to use in hypothesis testing:
i. H0: 𝜇 = 27, H1: 𝜇 ≠ 27, 𝑋̅ = 33, sample 𝜎 = 4, n = 25 - Normal distribution
ii. H0: 𝜇 = 98.6, H1: 𝜇 > 98.6, 𝑋̅ = 99.1, 𝜎 = 1.5, n = 50 - Normal distribution
iii. H0: 𝜇 = 3.5, H1: 𝜇 < 3.5, 𝑋̅ = 2.8, sample 𝜎 = 0.6, n = 18 – ‘t’distribution
iv. H0: 𝜇 = 57, H1: 𝜇 > 57, 𝑋̅ = 65, sample 𝜎 = 12, n = 42 - Normal distribution
2. i) Null hypothesis states that there is a significant difference between observed and
hypothetical values. (True/False)
ii) 1% level of significance means we are ready to reject a true hypothesis in 99% of cases.
(True/False)
iii) If the Null hypothesis H0: 𝜇 =𝑋̅or H0: p = ps or H0: 𝜇1 = 𝜇2 or H0: p1 = p2 then it is two-
tailed test. (True/False)
iv) If the calculated value of a statistic is not in the rejection region R, then Ho is accepted.
(True/False)
v) 1 - 𝛽 is called power of the test. (True/False)
vi) If n1 = 300, n2 = 500, 𝜇1 = 50, 𝜇 2 = 60, 𝜎1 = 10, 𝜎 2 = 12 are results of two samples taken
from two cities A and B then we test for between means under different population.
(True/False)
vii) If n < 30, then we do not apply z test unless, population S.D is known. (True/False)
UNIT 10
1. 𝑋2 – test is a non-parametric test.
2. A table with 4 rows and 2 columns has the degrees of freedom of 3.
3. 𝑋2 – test is wholly based on sample data.
4. If there are four rows and five columns in classification for 𝑋2 – test, then the number of
degrees of freedom equal to12.
5. If the calculated 𝑋2 value is less than the tabulated 𝑋2 value, then the null hypothesis is
not rejected.
UNIT 11
1. State whether the following statements are ‘True’ or ‘False’
i) Analysis variance is useful to test several means. - TRUE
ii) Another tool applied to test several means is Z/t–test. - FALSE
iii) F-ratio is always calculated with respect to mean square error. - TRUE
iv) The F-distribution curve depends on the degrees of freedom. - TRUE
v) In applying analysis of variance, the sample sizes must be equal. - FALSE
vi) In one-way ANOVA, the null hypothesis always states that all the population means are
different. - TRUE
vii) The F-statistic is the ratio of variance between the samples to the variance within the
samples. - TRUE
2. If we take only one factor and investigate the difference amongst its various categories
having numerous possible values, we are said to use
i) Two-way ANOVA
ii) One-way ANOVA
iii) Multi-way ANOVA
iv) Four-way ANOVA
3. The sum of squares for variance between samples is 8 and the sum of
squares for variance within samples is 24, then the sum of squares for
total variance is
i) 16
ii) 32
iii) 48
iv) 8
UNIT 12
Calculate the required correlation coefficients.
1. i. From the following data, calculate the correlation between variables 1 and 2 keeping the
3rd constant.
r12 = 0.7; r13 = 0.6 r23 = 0.4
Ans.i The correlation between variables 1 and 2 keeping the 3rd constant is given by:
r12 − r13. r23 0.7 − 0.6.0.4 = 0.46
r12.3 = √1 − r2 √1 − r2 = 0.728 = 0.631
13 23 √1 − 0.62 √1 − 0.42
iii. Given the zero order correlation coefficients, calculate the partial correlation between
variables 1 and 3 keeping the 2nd variable constant. Interpret your result.
r12 = 0.8; r13 = 0.6; r23 = 0.5
Ans.iii The correlation between variables 1 and 3 keeping the 2nd constant is given by:
r13 − r12. r23
r13.2 = √1 − r2 √1 − r2 =
0.6 − 0.8 . 0.5 = 0.2 = 0.39
0.51
12 23 √1 − 0.8 √1 − 0.5
2 2
2. State whether the following statements are ‘True’ or ‘False’.
i. Scatter diagram does not give us a quantitative measure of correlation coefficient. - TRUE
ii. Correlation estimates the value of one variable from the knowledge of the other. - FALSE
iii. Correlation coefficient is an absolute measure. - FALSE
UNIT 13
State whether the following statements are ‘True’ or ‘False’.
1. Forecast is an estimate based solely on past data of the series under investigation. - FALSE
2. In time series analysis method a comparative study of variations can be made. - TRUE
3. In exponential smoothing, old observations are given increasing exponential weightage.
- FALSE
UNIT 14
1. State ‘True’ or ‘False’
i) ‘The prices of cooking oils reduce after the harvesting of oil seeds and go up after some
time’ is an example of cyclic variations in a time series. – FALSE
ii) The effect of national strikes, floods, earthquakes are examples of random variations in time
series. - TRUE
UNIT 15
1. Find out the price index number using simple aggregate method for the data represented in
table 15.3.
Table 15.3: Price of the Commodities for Years 2001 and 2002
Commodity Price in Rs. Per Quintal
Base year 2001(p1) Base year 2002(p2)
A 80 100
B 120 250
C 100 150
D 200 300
Ans.1
Here, ∑𝒑𝟎 = 𝟓𝟎𝟎, ∑𝒑𝟏 = 𝟖𝟎𝟎
∑𝒑𝟏 𝟖𝟎𝟎
𝑷 = × 𝟏𝟎𝟎 = × 𝟏𝟎𝟎 = 𝟏𝟔𝟎
𝟎𝟏 ∑𝒑𝟎 𝟓𝟎𝟎
2. The data in table 15.10 is related to workers in an industrial town. Calculate consumer price
index number by using family budget method.
Table 15.10: Price Index and Percentage Expenditures of Items
Item of Consumption Price Index P Percentage
Expenditure
Food 200 50
Clothing 175 10
Fuel & Lighting 160 12
Housing 225 15
Miscellaneous 150 13
Ans.2
Item of Consumption Price Index P Weight W PW
Food 200 50 10000
Clothing 175 10 1750
Fuel & Lighting 160 12 1920
Housing 225 15 3375
Miscellaneous 150 13 1950
∑W = 100 ∑PW =
18995
∑𝐏𝐖 𝟏𝟖𝟗𝟗𝟓
𝑷𝟎𝟏 = = = 𝟏𝟖𝟗. 𝟗𝟓
∑𝐖 𝟏𝟎𝟎
3. In any distribution when the original items differ in size, the value of Arithmetic mean
(AM), Geometric mean (GM) and Harmonic mean (HM) would also differ in the following
order
a) AM>GM>HM
b) AM=GM=HM
c) AM<HM<GM
d) AM.GM>HM
5) Which of the following factors does not affect the width of a confidence interval?
i) Sample size
ii) Confidence desired
iii) Variability in the population
iv) Population size
6. The sum of squares for variance between samples is 8 and the sum ofsquares for variance
within samples is 24, then the sum of squares fortotal variance is
a) 16
b) 32
c) 48
d) 8
7) The prices of cooking oils reduce after the harvesting of oil seed sand go up after some
time’ is an example of cyclic variations in a time series.
a) True
b) False
9) What test would you use to determine whether a set of observed frequencies differ from
their corresponding expected frequencies?
a) The t test for dependent samples
b) The Chi-Square test
c) The t test for independent samples
d) The F test
Section B
Short Answers (5 Marks each)
b) The probabilities that component A and component B of a machine will fail are 0.09
and 0.06 respectively. The machine will fail if any one of them fails. Find the probability
that it will fail?
Ans.b Let P(A) be the probability that component A fails, and P(B) be the probability that
component B fails.
The probability that the machine does not fail P(Not Failing) is the complement of the machine
failing, and it is given by:
P(Failing) = 1−0.8554
P(Failing) = 0.1446
c) An unbiased coin is tossed six times. What is the probability that the tosses will result
in:
i) Exactly two heads
ii) At least five heads
Ans.c) P(X=k) = (nCk) ⋅ pk ⋅ (1−p)n−k
d) A production company has 350 hourly employees having average 37.6 years of age,
with a standard deviation of 8.3. If the sample average is 40 years of age and z-value is
2.07, calculate the required sample size.
Ans.d Given:
• A production company has 350 hourly employees having average 37.6years of age, with
a standard deviation of 8.3
• The sample average is 40 years of age
• The z-value is 2.07
𝑥̅−𝜇 𝜎
We know, 𝑧 = 𝑎𝑛𝑑 𝑆𝐸 =
𝑆𝐸 √𝑛
𝜎 8.3
𝑆𝐸 = =
√𝑛 √𝑛
𝑥̅ − 𝜇
𝑧=
8.3
√𝑛
40 − 37.6
2.07 =
8.3
√𝑛
(2.07)(8 ⋅ 3)
√n = = 7.15
2⋅4
𝑛 = 51.24 ≈ 51
e) Three varieties of crops ‘A’, ‘B’, and ‘C’ are tested in a randomized block design with
four replications. The yields are depicted in table 11.6. Test at 0.05 level of significance
whether there is a difference between replications. Test also whether the varieties differ
significantly. Answer the question taking a significant level of 5%.
Variety Replications
1 2 3 4
A 6 4 8 6
B 7 6 6 9
C 8 5 10 9
Ans.e N = 12, T = Sum of all values = 84
2 2
Correction Factor = T = 84 = 588
N 12 𝟐
SST (Total Sum of the Squares) = Sum of squares of all observations - 𝐓 = (62 + 42 + 82
𝐍
+ 6 + ……. 10 + 9 ) – 588 = 624 – 588 = 36
2 2 2
𝑀𝑆𝐶 6
𝐹𝑐 = = = 3.61
𝑀𝑆𝐸 1.67
𝑀𝑆𝑅 4
𝐹𝑟 = = = 2.40
𝑀𝑆𝐸 1.67
For Replication:
The calculated value of 𝐹𝑐 is 3.61. The table value of F for (3,6) df at 5% level of significance
is 4.76. Since the calculated value of F is less than the table value, we accept the null hypothesis
and conclude that it is not significant.
For Variety:
The calculated value of 𝐹𝑟 is 2.40. The table value of F for (2,6) df at 5% level of significance
is 5.14. Since the calculated value of F is less than the table value, we accept the null hypothesis
and conclude that it is not significant.
Section C
Long Answers (10 Marks each)
1. Distinguish between:
2. A factory has three machines M1, M2 and M3. They produce 4000, 10,000 and 6,000
products per day. From past records, it is known that M1, M2, and M3 produce 5%, 4%,
and 8% defectives. A product is selected at random from the day’s production and is
found to be defective. What is the probability that it was not produced by machine M3?
Ans.2
Calculating the total number of defectives:
Total defectives = (Defectives from M1) + (Defectives from M2) + (Defectives from M3)
Total defectives = (0.05 * 4000) + (0.04 * 10000) + (0.08 * 6000)
Total defectives = 200 + 400 + 480
Total defectives = 1080
Therefore, the probability that a defective product was not produced by machine M3 is
approximately 0.5556 or 55.56%.
3. In a very large organisation, the director wanted to find out what proportions of the
employees prefer to provide their own retirement benefits in lieu of a company –
sponsored plan. A simple random sample of 75 employees was taken. It was found that
40%, that is, 0.4 of them are interested in providing their own retirement plans. The
management requests that we use this sample to find an interval about which they can
be 99 percent confident that it contains the true population proportion.
Ans.3
𝑝(1−𝑝)
Confidence Interval = 𝑝 ± 𝑧critical × √ 𝑛
Given:
- Sample proportion (𝑝) = 0.4
- Sample size (n) = 75
- Confidence level = 99% (z-critical value = 2.576)
Therefore, the 99% confidence interval for the true population proportion is approximately
(0.2541, 0.5459).
4. Calculate Spearman’s rank correlation coefficient between the series A and B depicted
in table:
Series A 57 59 62 63 64 65 55 58 57
Series B 113 117 126 126 130 129 111 116 112
Ans.4
ASSIGNMENT QUESTIONS
SET -1
1 Define statistics. Explain various functions of statistics. Also discuss the key limitations
of statistics.
Answer 1: The study of statistics is a subfield of mathematics that deals with data gathering,
analysis, interpretation, presentation, and organisation. It offers techniques for condensing and
characterising different facets of data, facilitating the development of insightful findings and
conclusions. Descriptive statistics entail summarising and presenting data, while inferential
statistics include drawing conclusions or predictions about a population from a sample.
Together, these two types of statistics are included in statistics. Statistics is essential for
research, decision-making, and comprehending patterns and trends in datasets in a variety of
disciplines, including science, business, economics, and social sciences.
Functions of Statistics
4. Prediction: Based on past data, prediction uses statistical models to project future
trends or results. Predictive modelling frequently makes use of regression analysis and
time series analysis. Analysts can assist in planning and strategic decision-making by
using patterns in historical data to inform their projections.
Limitation of Statistics
1. Qualitative Data Exclusion: Statistics deals mostly with quantitative data, such
as numerical values and quantifiable quantities. Qualitative data, which refers to
traits, meanings, or attributes, is not directly analysed statistically. Inherently
qualitative phenomena such as beauty, intelligence, or emotions present difficulties
for direct statistical quantification.
2. Limitation with individual facts: Applying statistical approaches to groupings
or aggregation of data, as opposed to individual facts, yields better results. The
complexity of examining and evaluating specific cases poses difficulties, and
statistical methods could not produce significant insights on their own. Dealing with
patterns and trends seen among collections of data points is where statistics excels.
2. Define Measurement Scales. Discuss Qualitative and Quantitative data in detail with
examples.
Answer 2: Measurement Scale
The term “measurement scale,” which can also be used to refer to a scale of measurement or
degree of measurement, describes how variables are categorised or classified according to the
properties and nature of the data they represent. Measurement scales define the kinds of
statistical studies that can be performed on variables and offer a framework for comprehending
their characteristics.
1. Qualitative Data: Qualitative data refers to non-numerical data that characterises the
traits, features, or attributes of a subject. This kind of data is categorical, meaning it represents
different labels or categories. When classifying and categorising information, qualitative data
is frequently utilised instead of numeric observations because it is based on intrinsic qualities.
The two primary categories of qualitative data are ordinal and nominal.
a. Nominal Data: These are categorical data, which lack any sort of natural order or
ranking and instead demonstrates several categories or groups. These divisions are
separate and incompatible. Without assuming any quantitative relationship, nominal
data just enable category identification and distinction. Nominal data examples include:
i. Shades of colour, such as yellow, pink, white.
ii. Types of animals, such as birds, dogs, and cats.
b. Ordinal Data: Ordinal data are categorical data as well, but they are distinguished from
other types of data by a significant ranking or order. The gaps between the categories
are not consistent or measurable, despite the significance of the order. Ordinal data
make it possible to compare relative positions, however rank differences aren't always
equal. Ordinal data examples include:
i. Levels of education (such as graduate, college, or high school).
ii. Ratings of customer satisfaction (such as Excellent, Good, and Fair)
2. Quantitative Data: Quantitative data refers to numerical data that has measurable
values and can be stated numerically. With the use of mathematical computations and
statistical analysis, this kind of data offers a more thorough and accurate consideration of a
phenomenon. The two primary categories of quantitative data are interval and ratio data.
a. Interval Data: Quantitative data with equal intervals between values but no genuine 0
point on the scale is known as interval data. Stated differently, a zero value does not
imply the lack of the measured property, even though the numerical disparities between
values have meaning. Temperature readings in Celsius or Fahrenheit are typical
instances of interval data. On the Celsius scale, 0 denotes a particular point rather than
the whole absence of temperature.
b. Ratio Data: Quantitative data in the form of ratios is defined by uniform spacing
between values and a real zero point. The presence of the measured property is implied
by a zero value in a true zero point. Ratio comparisons between values can now be
meaningfully performed. A 0 value denotes the total absence of the attribute being
measured, and ratio data examples include height, weight, income, age, and other
metrics. Consider the non-existence of a person with a height of zero.
3. Discuss the basic laws of Sampling theory. Define following Sampling techniques with
help of examples:
Stratified Sampling
Cluster Sampling
Answer 3:
Sampling Techniques
SET – 2
4. Define Business Forecasting. Explain various methods of Business Forecasting.
Answer 4:
USINESS FORECASTING
Business Forecasting is the methodical process of forecasting future trends, events, or business
results using a variety of quantitative and qualitative techniques, historical data, and analysis.
Helping businesses plan for the future, make educated decisions, and adjust to expected
changes is the aim of business forecasting. Forecasting future demand for goods and services
entails assessing market trends, projecting financial results, and taking potential effects on the
business environment into account. Methods of Business Forecasting
1. Business Barometers: Using business barometers means keeping an eye on and
evaluating leading signs, or economic factors, which typically shift before the economy
as a whole. For example, to predict changes in the economy and make wise choices
about inventory control and marketing tactics, a retail company may monitor consumer
confidence indexes, interest rates, and stock market patterns.
2. Time Series Analysis: Examining past data points successively across time in
order to spot trends, patterns, and seasonality is known as time series analysis. In order
to better accurately estimate future production needs, a manufacturing corporation, for
example, could analyse quarterly production data spanning several years in order to
discover repeating patterns.
Index numbers are useful in a variety of industries and have multiple uses. Several important
facets of their usefulness are as follows:
1. Comparison of Time Periods: Index numbers make it possible to compare
variables or data points from various time periods. Analysts can evaluate the
proportional changes or trends over time by defining a base period or base value.
3. Inflation and Price Changes: Inflation and price fluctuations are commonly
gauged using index numbers. Examples that quantify variations in the cost of living and
manufacturing costs, respectively, are the Consumer Price Index (CPI) and the Producer
Price Index (PPI).
6. Discuss various types of Estimators. Also explain the criteria of a good estimator.
Answer 6: TYPES OF ESTIMATORS
The following are the two types of estimators:
1. Point Estimator: An estimator that gives a precise, single-value estimate for the
parameter of interest is called a point estimator. With the goal of coming as close as
feasible to the actual value of the population parameter, it reduces the data from a sample
to a single numerical value. Sample proportion, sample variation, and sample mean are
typical examples. Point estimators lack information regarding the estimate's variability
or dependability, despite being simple and easy to understand.