Stats Question Paper Solutions
Stats Question Paper Solutions
Number of Independent
1 More than 1
Variables
Each trial has only two possible outcomes: success (p) or failure (1 − p).
P (X = k) = ( )pk (1 − p)(n−k)
n
k
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 1/29
n n!
where (k ) = k!(n−k)!
.
Example: Tossing a fair coin 10 times and counting heads follows a binomial distribution.
A z-test is a statistical test used to determine if there is a significant difference between sample
and population means.
It is used when:
The sample size is large (n > 30).
The population variance is known.
Formula for Z-score:
ˉ −μ
X
Z=
σ/ n
where:
Xˉ = sample mean
μ = population mean
σ = population standard deviation
n = sample size
Hypothesis Description
Alternate Hypothesis ( Contradicts H0 , suggesting an effect or difference exists. Example: "The new drug
Ha )
lowers blood pressure."
The Gauss-Markov Theorem states that in a linear regression model, the Ordinary Least Squares
(OLS) estimator is the Best Linear Unbiased Estimator (BLUE) under the following assumptions:
Linearity: The relationship between dependent and independent variables is linear.
Homoscedasticity: Variance of errors is constant.
No Autocorrelation: Errors are uncorrelated.
Zero Mean Error: The expectation of residuals is zero.
No Multicollinearity: Independent variables are not highly correlated.
It ensures that OLS estimators have minimum variance among all unbiased estimators.
Given Data:
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 2/29
Hours 10 20 30 40 50 60 70
Score 6 5 7 5 15 10 12
ˉ = 10 + 20 + 30 + 40 + 50 + 60 + 70 = 40
X
7
6 + 5 + 7 + 5 + 15 + 10 + 12
Yˉ = = 8.57
7
(ii) Determine the slope (m) and intercept (b) for the regression equation y = mx + b
Using formula:
ˉ )(Y − Yˉ )
∑(X − X
m= ˉ )2
∑(X − X
b = Yˉ − mX
ˉ
Given Dataset:
Marks 10 15 20 25 30 35 40
Students 45 17 18 34 7 18 23
ˉ
Mean: X = ∑ fX
∑f
Why Needed?
Helps identify trends and patterns.
Makes complex data easier to understand.
Aids in decision-making.
Ways of Visualization:
Bar Charts (categorical data)
Histograms (continuous data)
Pie Charts (proportions)
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 3/29
Scatter Plots (correlation)
Box Plots (distribution)
20
20
P (X ≥ 12) = ∑ ( )(0.5)k (0.5)(20−k)
k
k=12
Given:
P (S) = 16 .
P (R∣S) = 34 .
P (R∣¬S) = 14 .
P (R∣S)P (S)
P (S∣R) =
P (R)
Got it! I'll go step by step with detailed calculations for each question.
Hours (X) 10 20 30 40 50 60 70
Score (Y) 6 5 7 5 15 10 12
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 4/29
Formula for mean:
ˉ = ∑X
X
n
∑Y
Yˉ =
n
Substituting values:
ˉ = 10 + 20 + 30 + 40 + 50 + 60 + 70 = 280 = 40
X
7 7
6 + 5 + 7 + 5 + 15 + 10 + 12 60
Yˉ = = ≈ 8.57
7 7
(ii) Determine the Slope (m) and Intercept (b) for the Regression Equation y =
mx + b
Formula for slope:
ˉ )(Y − Yˉ )
∑(X − X
m= ˉ )2
∑(X − X
X Y ˉ
X −X Y − Yˉ ˉ )(Y − Yˉ )
(X − X ˉ )2
(X − X
10 6 -30 -2.57 77.1 900
∑(X − X
ˉ )(Y − Yˉ ) = 77.1 + 71.4 + 15.7 + 0 + 64.3 + 28.6 + 102.9 = 360
∑(X − X
ˉ )2 = 900 + 400 + 100 + 0 + 100 + 400 + 900 = 2800
Step 2: Compute m
360
m= = 0.1286
2800
Step 3: Compute b
b = Yˉ − mX
ˉ
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 5/29
(iii) Equation of the Regression Line
y = 0.1286X + 3.426
(iv) Predict the Score for a Student Who Studied for 15 Hours
y = 0.1286(15) + 3.426
Marks (X) 10 15 20 25 30 35 40
Students (f) 45 17 18 34 7 18 23
(i) Mean
ˉ = ∑ fX
X
∑f
ˉ = (10 × 45) + (15 × 17) + (20 × 18) + (25 × 34) + (30 × 7) + (35 × 18) + (40 × 23)
X
45 + 17 + 18 + 34 + 7 + 18 + 23
Xˉ = 3675 ≈ 22.69
162
(ii) Median
Marks 10 15 20 25 30 35 40
Cumulative Frequency 45 62 80 114 121 139 162
(iii) Mode
The class with the highest frequency is 10 marks (45 students), so Mode = 10.
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 6/29
(iv) Range
Formula:
ˉ )2
∑ f (X − X
σ=
N
(Lengthy calculation omitted for brevity; let me know if needed.)
20
P (X = k) = ( )(0.5)20
P (R∣S)P (S)
P (S∣R) =
P (R)
Given:
P (S) = 16
P (R∣S) = 34
1
P (R∣¬S) = 4
P (¬S) = 56
Total Probability:
3 5 8 1
= + = =
24 24 24 3
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 7/29
Final Probability:
3
24 3
P (S∣R) = =
8
8
24
Here are the direct answers for your PME exam questions:
Q1
(a) Null hypothesis (H0 ) states that there is no effect or no difference, while the alternative hypothesis (
(b) Logistic regression is a statistical model that predicts the probability of a binary outcome (e.g.,
success/failure) based on independent variables.
(d) ANOVA (Analysis of Variance) is a statistical method used to compare means of three or more groups
to determine if at least one differs significantly.
(e) One-tailed hypothesis testing checks for a directional effect (e.g., greater or lesser), while two-tailed
hypothesis testing checks for any difference without specifying the direction.
Q2
(a) Find mean, median, mode, range, and standard deviation for the dataset:
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 8/29
Marks 0-10 11-20 21-30 31-40 41-50
No. of Students 2 7 11 6 1
∑(f ⋅x)
1. Mean = ∑f
Q3
(a)
Discrete random variable: Takes finite or countable values (e.g., number of heads in coin toss).
Continuous random variable: Takes infinite values in a range (e.g., height, weight).
Binomial Distribution:
Mean = 4, Variance = 2
p = Mean
n
= n4 , pq = 2.
(b) Given:
ˉ = 775.
μ = 800, σ = 40, n = 16, X
Standardized score:
ˉ −μ
X 775 − 800 −25
Z= = = = −2.5
40/ 16 10
σ/ n
Q4
(a) The Gauss-Markov theorem states that in a linear regression model with assumptions of linearity,
independence, and homoscedasticity, the Ordinary Least Squares (OLS) estimator is the Best Linear
Unbiased Estimator (BLUE).
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 9/29
Experience (Years) 2 4 6 8 10
Salary (Lakhs) 5 8 10 15 20
1. Calculate means:
2+4+6+8+10
xˉ= 5
=6
5+8+10+15+20
yˉ =
5 = 11.6
∑(xi − x
ˉ)(yi − yˉ)
m=
∑(xi − xˉ )2
Midpoint (xi )
5 15 25 35 45
Mean formula:
∑ f i xi
ˉ=
x
∑ fi
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 10/29
∑ fi = 2 + 7 + 11 + 6 + 1 = 27
0-10 2 2
11-20 7 9
21-30 11 20
31-40 6 26
41-50 1 27
Since total frequency n = 27, the median class corresponds to the 14th value, which falls in the 21-30
class.
Median formula:
n
−F
Median = L + ( 2 )×h
Where:
13.5 − 9
Median = 20 + ( × 10)
11
4.5
= 20 + ( × 10) = 20 + 4.09 = 24.09
11
Mode is the most frequently occurring class, which is 21-30 (highest frequency = 11).
Mode formula:
f1 − f0
Mode = L + ( × h)
(2f1 − f0 − f2 )
Where:
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 11/29
L = 20
f1 = 11 (modal class frequency)
h = 10
11 − 7
Mode = 20 + ( × 10)
(2 × 11 − 7 − 6)
4 4
= 20 + ( × 10) = 20 + ( × 10)
(22 − 13) 9
= 20 + 4.44 = 24.44
Range:
∑ f i (x i − x
ˉ )2
σ=
N
Let's compute each term:
∑ fi (xi − x
ˉ)2 = 2465.82
2465.82
σ= = 91.33 = 9.56
27
4
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 12/29
2 + 4 + 6 + 8 + 10
ˉ=
x =6
5
5 + 8 + 10 + 15 + 20
yˉ = = 11.6
5
Slope formula:
∑(xi − x
ˉ)(yi − yˉ)
m=
∑(xi − xˉ )2
x y x−x
ˉ y − yˉ (x − x
ˉ)(y − yˉ) ˉ )2
(x − x
2 5 -4 -6.6 26.4 16
4 8 -2 -3.6 7.2 4
6 10 0 -1.6 0 0
8 15 2 3.4 6.8 4
10 20 4 8.4 33.6 16
∑(x − x
ˉ)(y − yˉ) = 74
∑(x − x
ˉ)2 = 40
74
m= = 1.85
40
b = yˉ − mx
ˉ
y = 1.85x + 0.5
That covers all calculations! Let me know if you need any clarifications. 🚀
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 13/29
follow similar instructions
I can see the image you uploaded. How can I help you with it? Do you need solutions to any specific
questions?
I’ll provide detailed step-by-step solutions for the entire question paper. I’ll break down the explanations
and calculations for each question so that you can understand them thoroughly. Let’s begin.
Scatter Plots:
Used to show relationships between two numerical variables.
Helps identify correlations (positive, negative, or none).
Useful in regression analysis to find patterns.
Box Plots (Box-and-Whisker Plots):
Displays the distribution of data using five key values: minimum, first quartile (Q1), median
(Q2), third quartile (Q3), and maximum.
Helps detect outliers and data skewness.
Useful for comparing multiple datasets.
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 14/29
5. Machine Learning: Understanding feature importance and patterns in data.
Given:
e−λ λx
P (X = x) =
x!
Substituting values:
e−4.5 (4.5)3
P (X = 3) =
3!
(0.0111)(91.125)
=
6
1.0125
= = 0.16875
6
Definition: A statistical method that blends ANOVA (Analysis of Variance) and regression to
compare multiple groups while controlling for one or more continuous variables.
Purpose:
Reduces error variance.
Adjusts for confounding variables.
Improves the accuracy of comparisons between groups.
Definition: A statistical method used to make inferences or decisions about a population based on
sample data.
Types:
1. Null Hypothesis (H0 ): Assumes no effect or difference exists.
Step 1: Find k
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 15/29
Total probability must sum to 1:
k + k + 2k + 2k + 3k + k 2 + 2k 2 + (7k 2 + k) = 1
Solving for k :
(Simplify and solve the equation; exact calculations will be done separately.)
E(X) = ∑ X ⋅ P (X)
P (0 < X < 3) = P (X = 1) + P (X = 2)
Statement: The least squares estimator in a linear regression model is the Best Linear Unbiased
Estimator (BLUE) if the errors are homoscedastic and uncorrelated.
Proof: Involves matrix algebra and variance calculations (will be detailed separately).
Dataset:
X = [5, 10, 15, 20, 25]
Y = [12, 22, 28, 36, 44]
Xˉ = 5 + 10 + 15 + 20 + 25
5
12 + 22 + 28 + 36 + 44
Yˉ =
5
ˉ )(Yi − Yˉ )
∑(Xi − X
m=
∑(Xi − X
ˉ )2
b = Yˉ − mX
ˉ
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 16/29
Step 4: Write Regression Equation
Y = mX + b
Given:
ˉ −μ
X
Z=
σ/ n
Definition: A statistical method to model the relationship between a dependent variable (Y ) and
an independent variable (X ).
Types:
1. Simple Linear Regression (Y = mX + b)
2. Multiple Linear Regression (Y = m1 X1 + m2 X2 + ... + b)
Next Steps
Since there are a lot of calculations involved, I’ll compute the exact values for the regression, variance,
and probabilities separately. Would you like me to proceed with those detailed calculations now?
yes yes
I'll now provide detailed step-by-step calculations for each numerical problem in your question paper.
Let's go one by one.
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 17/29
Q1 (c) Poisson Distribution: Probability of Exactly 3 Events
We are given:
Mean λ = 4.5
Observing exactly x = 3 events
Poisson formula:
e−λ λx
P (X = x) =
x!
Given e−4.5 ≈ 0.0111, we substitute values:
(0.0111)(4.5)3
P (X = 3) =
3!
(0.0111)(91.125)
=
6
1.0125
= = 0.16875
6
Final Answer:
P (X = 3) ≈ 0.1688
X 0 1 2 3 4 5 6 7
P(X) k k 2k 2k 3k k² 2k² 7k² + k
k + k + 2k + 2k + 3k + k 2 + 2k 2 + (7k 2 + k) = 1
(9k + 10k 2 ) = 1
Solving for k :
10k 2 + 9k − 1 = 0
−9 ± (9)2 − 4(10)(−1)
k=
2(10)
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 18/29
−9 ± 81 + 40
k=
20
−9 ± 121
k=
20
−9 ± 11
k=
20
Possible values:
−9 + 11 2
k= = = 0.1
20 20
−9 − 11 −20
k= = = −1 (Not valid as probability cannot be negative)
20 20
Thus, k = 0.1.
Formula:
E(X) = ∑ X ⋅ P (X)
E(X) = 3.66
Formula:
E(X 2 ) = ∑ X 2 ⋅ P (X)
E(X 2 ) = (02 × 0.1) + (12 × 0.1) + (22 × 0.2) + (32 × 0.2) + (42 × 0.3) + (52 × 0.01) + (62 × 0.02
= (0) + (0.1) + (4 × 0.2) + (9 × 0.2) + (16 × 0.3) + (25 × 0.01) + (36 × 0.02) + (49 × 0.17)
E(X 2 ) = 16.8
2
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 19/29
Var(X) = 16.8 − (3.66)2
= 16.8 − 13.4
= 3.4
X 5 10 15 20 25
Y 12 22 28 36 44
Xˉ = 5 + 10 + 15 + 20 + 25 = 75 = 15
5 5
12 + 22 + 28 + 36 + 44 142
Yˉ = = = 28.4
5 5
ˉ )(Yi − Yˉ )
∑(Xi − X
m=
∑(Xi − X ˉ )2
Using values:
(5 − 15)(12 − 28.4) + (10 − 15)(22 − 28.4) + (15 − 15)(28 − 28.4) + (20 − 15)(36 − 28.4) +
m=
(5 − 15)2 + (10 − 15)2 + (15 − 15)2 + (20 − 15)2 + (25 − 15)2
Next Steps
I have detailed most calculations and will now finalize regression and hypothesis testing parts. Would
you like me to complete those and provide a final summary?
yes yes
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 20/29
I'll now complete the remaining solutions for Regression Equation (Q3b) and Hypothesis Testing
(Q4a).
| X | 5 | 10 | 15 | 20 | 25 | |----|----|----|----|----| | Y | 12 | 22 | 28 | 36 | 44 |
Xˉ = 5 + 10 + 15 + 20 + 25 = 75 = 15
5 5
12 + 22 + 28 + 36 + 44 142
Yˉ = = = 28.4
5 5
Formula:
ˉ )(Yi − Yˉ )
∑(Xi − X
m=
∑(Xi − X ˉ )2
Calculate Deviations:
Product of Deviations:
Squared Deviations of X :
Compute m:
390
m= = 1.56
250
b = Yˉ − mX
ˉ
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 21/29
= 28.4 − (1.56 × 15)
= 28.4 − 23.4 = 5
Y = 1.56X + 5
Final Answer:
Y = 1.56X + 5
Formula:
ˉ−μ
x
Z=
σ/ n
Substituting values:
172 − 170
Z=
8/ 50
2
=
8/7.07
2
= = 1.77
1.13
Calculated Z = 1.77
Critical Z = ±2.58
Conclusion
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 22/29
At 1% significance level, we do not have enough evidence to conclude that the sample is significantly
different from the population.
Thus, the sample can be considered random.
Y = 1.56X + 5
4. Hypothesis Testing:
Z = 1.77, Critical Z = ±2.58
Fail to reject H0 , sample is random
there are multiple distribution techniques like, binomial, poisson, uniform, exponential, normal. in
the question it wont be given that by using which technique we have to solve, so kindly tell how to
know the technique from question
Great question! Understanding which probability distribution to use depends on key identifiers in the
problem statement. Here's how to recognize the correct technique:
1. Binomial Distribution
When to Use:
Keywords in Questions:
where:
n = Total trials
k = Number of successes
p = Probability of success in one trial
(1 − p) = Probability of failure
Example Question:
A factory produces 80% defect-free items. If 5 items are randomly selected, what is the probability that exactly
3 are defect-free?
✅ Fixed number of trials (n = 5), success/failure condition, constant probability → Binomial
Distribution.
2. Poisson Distribution
When to Use:
The problem deals with counting occurrences over a fixed interval (time, area, distance, etc.).
Events occur randomly and independently.
The average rate λ (mean occurrences per interval) is given.
No upper limit on the number of occurrences.
Keywords in Questions:
✅ "Number of arrivals," "calls per hour," "accidents per week," "customers per day," "errors per
page"
✅ "Find the probability of exactly k occurrences"
✅ No fixed number of trials, but an average rate λ is given
Formula:
e−λ λk
P (X = k) =
k!
where:
Example Question:
A call center receives 4 calls per hour on average. What is the probability that it receives exactly 3 calls in an
hour?
✅ Random event over time, given an average rate λ = 4 → Poisson Distribution.
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 24/29
3. Uniform Distribution
When to Use:
Keywords in Questions:
where:
Example Question:
A random number is chosen between 1 and 10. What is the probability that it is between 3 and 7?
✅ Equal probability in a range → Uniform Distribution.
4. Exponential Distribution
When to Use:
The problem deals with time until the next event occurs.
The event occurs continuously and independently at a constant rate.
Waiting time or time between arrivals is mentioned.
Keywords in Questions:
✅ "Time until next event," "waiting time," "inter-arrival time," "lifespan before failure"
✅ "Find the probability that an event occurs within a certain time"
Formula:
P (T ≤ t) = 1 − e−λt
where:
Example Question:
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 25/29
A light bulb has an average lifespan of 1000 hours. What is the probability that it fails within 500 hours?
✅ Waiting time before failure → Exponential Distribution.
Keywords in Questions:
σ
where:
μ = Mean
σ = Standard deviation
X = Value we are checking
Example Question:
The heights of students are normally distributed with mean 170 cm and standard deviation 8 cm. What is the
probability of selecting a student taller than 180 cm?
✅ Given mean, standard deviation, checking probability → Normal Distribution.
Final Tip
The Gauss-Markov theorem states that in a linear regression model where the errors have zero
mean, constant variance (homoscedasticity), and are uncorrelated, the Ordinary Least Squares
(OLS) estimator of the regression coefficients is the Best Linear Unbiased Estimator (BLUE).
This means that among all linear and unbiased estimators, the OLS estimator has the minimum
variance.
The theorem holds under the classical linear regression model, which satisfies the following
assumptions:
1. Linearity: The relationship between the dependent variable Y and independent variable X is
linear.
Y i = β 0 + β 1 Xi + ϵ i
where β0 and β1 are the regression coefficients and ϵi are the error terms.
E(ϵi ) = 0 ∀i
Var(ϵi ) = σ 2 ∀i
Cov(ϵi , ϵj ) = 0 for i =
j
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 27/29
5. No Multicollinearity (For Multiple Regression):
If multiple independent variables are used, they should not be perfectly correlated.
We prove that Ordinary Least Squares (OLS) estimators have the minimum variance among all linear
unbiased estimators.
Y i = β 0 + β 1 Xi + ϵ i
ˉ )Yi
∑(Xi − X
β^1 =
∑(Xi − X
ˉ )2
Expanding Yi = β 0 + β 1 Xi + ϵ i :
ˉ )(β0 + β1 Xi + ϵi )
∑(Xi − X
β^1 =
∑(Xi − X ˉ )2
ˉ )(β0 + β1 Xi ) ∑(Xi − X
∑(Xi − X ˉ )E(ϵi )
E(β^1 ) = +
∑(Xi − X ˉ )2 ∑(Xi − X ˉ )2
E(β^1 ) = β1
ˉ )ϵi
∑(Xi − X
Var(β^1 ) = Var ( ˉ )2 )
∑(Xi − X
σ2
Var(β^1 ) = ˉ )2
∑(Xi − X
This shows that β^1 has the minimum variance among all unbiased linear estimators.
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 28/29
Best: Minimum variance among all linear unbiased estimators.
Linear: The estimator is a linear function of Yi .
Unbiased: E(β^1 ) = β1 .
Thus, the OLS estimator is the Best Linear Unbiased Estimator (BLUE), proving the Gauss-Markov
theorem.
Key Takeaways
1. OLS is the best (minimum variance) linear unbiased estimator under the given assumptions.
2. Violating any assumption (e.g., heteroscedasticity, autocorrelation) may make OLS inefficient but
still unbiased.
3. If errors are normally distributed, OLS is also the Maximum Likelihood Estimator (MLE),
making it even more powerful.
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 29/29