ass2_solns
ass2_solns
Policies:
For all multiple-choice questions, note that multiple correct answers may exist. However, selecting
an incorrect option will cancel out a correct one. For example, if you select two answers, one
correct and one incorrect, you will receive zero points for that question. Similarly, if the number
of incorrect answers selected exceeds the correct ones, your score for that question will be zero.
Please note that it is not possible to receive negative marks. You must select all the correct
options to get full marks for the question.
While the syllabus initially indicated the need to submit a paragraph explaining the use of AI or
other resources in your assignments, this requirement no longer applies as we are now utilizing
eClass quizzes instead of handwritten submissions. Therefore, you are not required to submit any
explanation regarding the tools or resources (such as online tools or AI) used in completing this
quiz.
This PDF version of the questions has been provided for your convenience should you wish to print
them and work offline.
Only answers submitted through the eClass quiz system will be graded. Please do not
submit a written copy of your responses.
Question 1. [1 mark]
Suppose you flip three coins. Suppose the first coin is represented by random variable X1 ∈ {0, 1},
the second coin by X2 ∈ {0, 1}, and the third coin by X3 ∈ {0, 1}. Which of the following is the
outcome space of the random variable X = (X1 , X2 , X3 )?
a. {0, 1}3
b. {1, 2, 3}
d. {0, 1}
Solution:
The correct answers are:
• a. {0, 1}3
Explanation:
The outcome space of the random variable X = (X1 , X2 , X3 ) consists of all possible combinations
of the coin flips, where each Xi can be either 0 or 1.
a. {0, 1}3 : Correct. This notation represents the Cartesian product of {0, 1} taken three times,
resulting in all 3-tuples where each component is 0 or 1.
1/15
Fall 2024 CMPUT 267: Basics of Machine Learning
b. {1, 2, 3}: Incorrect. This set does not represent the possible outcomes of the coin flips.
c. {(x1 , x2 , x3 ) | x1 ∈ {0, 1}, x2 ∈ {0, 1}, x3 ∈ {0, 1}}: Correct. This explicitly defines the set of
all 3-tuples where each xi is either 0 or 1.
d. {0, 1}: Incorrect. This set includes only two outcomes and does not account for all combina-
tions of three coin flips.
Question 2. [1 mark]
Suppose you roll a fair twenty-sided die. The outcome space is X = {1, 2, 3, . . . , 20}. Which of the
following is an event?
a. {0, 1, 2}
b. {x ∈ X | x > 10}
c. 12
d. {12}
Solution:
The correct answers are:
• b. {x ∈ X | x > 10}
• d. {12}
Explanation:
An event is any subset of the outcome space X .
a. {0, 1, 2}: Incorrect. This set includes 0, which is not in X , so it is not a subset of X and
therefore not an event.
b. {x ∈ X | x > 10}: Correct. This set includes all outcomes greater than 10 within X ,
specifically {11, 12, 13, 14, 15, 16, 17, 18, 19, 20}. It is a subset of X and thus an event.
d. {12}: Correct. This is the singleton set containing the element 12, which is a subset of X
and therefore an event.
Question 3. [1 mark]
Which of the following is an event from the outcome space X × Y, where X = R2 and Y = R?
a. X × Y
b. A function f : R2 × R → R
c. ((1, 2), 3)
2/15
Fall 2024 CMPUT 267: Basics of Machine Learning
d. {((x1 , x2 ), y) ∈ X × Y | y ≥ 300}
Solution:
Answers: a. X × Y and d. {((x1 , x2 ), y) ∈ X × Y | y ≥ 300}
Explanation:
An event is any subset of the outcome space. Given the outcome space X × Y, where X = R2 and
Y = R, let’s analyze each option:
1. Option a: X × Y Correct. This represents the entire outcome space itself, which is a valid
event (the certain event).
3. Option c: ((1, 2), 3) Incorrect. While ((1, 2), 3) is an element (outcome) of X × Y, it is not
an event. An event is a set of outcomes, not a single outcome unless specified as a singleton
set.
Question 4. [1 mark]
Which of the following is an event from the outcome space (X × Y)n , where X = R2 and Y = R?
a. (X × Y)n
b. (((x1,1 , x1,2 ), y1 ), . . . , ((xn,1 , xn,2 ), yn )) where xi,1 , xi,2 , yi ∈ R for all i ∈ {1, . . . , n}
c. {(((x1,1 , x1,2 ), y1 ), . . . , ((xn,1 , xn,2 ), yn )) ∈ (X × Y)n | yi ≥ 300 for all i ∈ {1, . . . , n}}
d. {(((x1,1 , x1,2 ), y1 ), . . . , ((xn,1 , xn,2 ), yn )) ∈ (X ×Y)n | xi,1 ≥ 3 and yi ≥ 300 for all i ∈ {1, . . . , n}}
Solution:
Answers: a. (X ×Y)n , c. {(((x1,1 , x1,2 ), y1 ), . . . , ((xn,1 , xn,2 ), yn )) ∈ (X ×Y)n | yi ≥ 300 for all i ∈
{1, . . . , n}}, and d. {(((x1,1 , x1,2 ), y1 ), . . . , ((xn,1 , xn,2 ), yn )) ∈ (X × Y)n | xi,1 ≥ 3 and yi ≥
300 for all i ∈ {1, . . . , n}}
Explanation:
An event is a subset of the outcome space (X × Y)n . Let’s evaluate each option:
1. Option a: (X × Y)n Correct. This represents the entire outcome space, which is a valid
event (the certain event).
2. Option b: (((x1,1 , x1,2 ), y1 ), . . . , ((xn,1 , xn,2 ), yn )) where xi,1 , xi,2 , yi ∈ R for all i ∈ {1, . . . , n}
Incorrect. This describes a single ordered n-tuple (a specific outcome), not an event. An
event should be a set of such outcomes.
3. Option c: {(((x1,1 , x1,2 ), y1 ), . . . , ((xn,1 , xn,2 ), yn )) ∈ (X ×Y)n | yi ≥ 300 for all i ∈ {1, . . . , n}}
Correct. This is a subset of (X × Y)n where all yi satisfy yi ≥ 300. It is a valid event.
3/15
Fall 2024 CMPUT 267: Basics of Machine Learning
4. Option d: {(((x1,1 , x1,2 ), y1 ), . . . , ((xn,1 , xn,2 ), yn )) ∈ (X ×Y)n | xi,1 ≥ 3 and yi ≥ 300 for all i ∈
{1, . . . , n}} Correct. This subset imposes conditions on both xi,1 and yi for all i, making it
a valid event.
Question 5. [1 mark]
Suppose you have a random variable Y representing the house prices in a city. You know Y
is distributed according to the normal distribution with mean 100, 000 and standard deviation
10, 000. Is the following True or False? Y is a continuous random variable.
Solution:
Answer: True.
Explanation:
A random variable is continuous if it can take on any value within a certain interval or range,
and its probability distribution is described by a continuous probability density function (pdf).
The normal distribution is a continuous distribution defined over the entire real line R. Since Y
follows a normal distribution with mean 100, 000 and standard deviation 10, 000, it can take any
real value, and its probabilities are determined by a continuous pdf. Therefore, Y is a continuous
random variable.
Question 6. [1 mark]
Suppose you have a random variable X representing the age of houses in a city. You know X is
distributed according to the continuous uniform distribution with outcome space X = [10, 20]. Let
p be the pdf of X. Is the following True or False? p(17) is the probability that X = 17.
Solution:
Answer: False.
Explanation:
For a continuous random variable X, the probability that X takes on any specific value x is zero,
i.e., P(X = x) = 0. The probability density function p(x) does not give the probability at a specific
point but rather represents the density of the distribution at that point. To find the probability
that X falls within an interval [a, b], we integrate the pdf over that interval:
Z b
P(a ≤ X ≤ b) = p(x) dx
a
Therefore, p(17) is not the probability that X = 17; it is the value of the pdf at x = 17.
Question 7. [1 mark]
For a continuous random variable X ∈ X , we know that P(X = x) = 0 for all x ∈ X . How is it
possible that the pdf p(x) 6= 0 for all x ∈ X ?
a. Because p(x) is not the probability of x. Instead, it is just a function that we can integrate
over to get a probability.
4/15
Fall 2024 CMPUT 267: Basics of Machine Learning
Solution:
The correct answer is:
• a. Because p(x) is not the probability of x. Instead, it is just a function that we can integrate
over to get a probability.
Explanation:
For continuous random variables, the probability of any specific value x is zero, i.e., P(X = x) = 0.
The probability density function p(x) represents the density of the distribution at point x and is
used to calculate probabilities over intervals through integration:
Z b
P(a ≤ X ≤ b) = p(x) dx
a
Therefore, even though P(X = x) = 0, the pdf p(x) can be non-zero for all x ∈ X because it is not
the probability at x, but a function that describes how probability is distributed over the range of
X.
Option b is incorrect because the pdf does not measure probability mass at a point. Option c is
incorrect because the pdf is not always zero for continuous variables. Option d is irrelevant because
X is a continuous random variable, not discrete.
Question 8. [1 mark]
Suppose that Y ∈ R is distributed according to Laplace(10,
2). Is the following True or False? The
1 |y−10|
probability distribution of Y is P(Y = y) = 4 exp − 2 for y ∈ R.
Solution:
Answer: False.
Explanation:
For a continuous random variable Y following a Laplace distribution with mean µ = 10 and scale
parameter b = 2, the probability density function (pdf) is:
1 |y − µ| 1 |y − 10|
p(y) = exp − = exp −
2b b 4 2
However, the probability that Y takes on any specific value y is zero, i.e., P(Y = y) = 0 for all
y ∈ R.
Therefore, the expression P(Y = y) = 41 exp − |y−10| 2 is incorrect. It should be stated as the
probability density function p(y), not the probability mass function P(Y = y).
Question 9. [1 mark]
Suppose you have two discrete random variables X ∈ {0, 1} and Y ∈ {1, 2, 3}. The joint probability
mass function (pmf) of X and Y is given by the following values:
1 1 1
p(0, 1) = , p(0, 2) = , p(0, 3) = ,
24 12 3
1 3 3
p(1, 1) = , p(1, 2) = , p(1, 3) = .
6 24 12
Which of the following is the marginal pmf of X?
5/15
Fall 2024 CMPUT 267: Basics of Machine Learning
11 13
a. pX (0) = , pX (1) =
24 24
1 1
b. pX (0) = , pX (1) =
2 2
7 5
c. pX (0) = , pX (1) =
12 12
5 7
d. pX (0) = , pX (1) =
12 12
Solution:
Answer:
11 13
• a. pX (0) = , pX (1) =
24 24
Explanation:
To find the marginal pmf pX (x), we sum the joint pmf over all possible values of Y for each X:
For x = 0:
pX (0) = p(0, 1) + p(0, 2) + p(0, 3)
1 1 1
= + +
24 12 3
1 2 8
= + + (common denominator of 24)
24 24 24
11
=
24
For x = 1:
pX (1) = p(1, 1) + p(1, 2) + p(1, 3)
1 3 3
= + +
6 24 12
4 3 6
= + + (common denominator of 24)
24 24 24
13
=
24
Therefore, the marginal pmf of X is:
11 13
pX (0) = , pX (1) =
24 24
This corresponds to option a.
a.
1 2 8
pY |X (1|0) = , pY |X (2|0) = , pY |X (3|0) =
11 11 11
4 3 6
pY |X (1|1) = , pY |X (2|1) = , pY |X (3|1) =
13 13 13
6/15
Fall 2024 CMPUT 267: Basics of Machine Learning
b.
1 1 1
pY |X (1|0) = , pY |X (2|0) = , pY |X (3|0) =
3 3 3
1 1 1
pY |X (1|1) = , pY |X (2|1) = , pY |X (3|1) =
3 3 3
c.
1 1 1
pY |X (1|0) = , pY |X (2|0) = , pY |X (3|0) =
2 4 4
1 1 1
pY |X (1|1) = , pY |X (2|1) = , pY |X (3|1) =
4 4 2
d.
1 1 1
pY |X (1|0) = , pY |X (2|0) = , pY |X (3|0) =
6 3 2
2 1 4
pY |X (1|1) = , pY |X (2|1) = , pY |X (3|1) =
7 7 7
Solution:
Answer:
Explanation:
The conditional pmf pY |X (y|x) is calculated using:
p(x, y)
pY |X (y|x) =
pX (x)
11 13
pX (0) = , pX (1) =
24 24
For x = 0:
1
p(0, 1) 1
pY |X (1|0) = = 24 =
pX (0) 11 11
24
1 2
p(0, 2) 12 24 2
pY |X (2|0) = = = =
pX (0) 11 11 11
24 24
1 8
p(0, 3) 8
pY |X (3|0) = = 3 = 24 =
pX (0) 11 11 11
24 24
For x = 1:
7/15
Fall 2024 CMPUT 267: Basics of Machine Learning
1 4
p(1, 1) 6 24 4
pY |X (1|1) = = = =
pX (1) 13 13 13
24 24
3
p(1, 2) 3
pY |X (2|1) = = 24 =
pX (1) 13 13
24
3 6
p(1, 3) 6
pY |X (3|1) = = 12 = 24 =
pX (1) 13 13 13
24 24
Therefore, the conditional pmf matches the one provided in option a.
Solution:
Answer: False.
Explanation:
Two random variables X and Y are independent if and only if:
8/15
Fall 2024 CMPUT 267: Basics of Machine Learning
Example for x = 0, y = 1:
11 5 55
pX (0) · pY (1) = · =
24 24 576
1 24
p(0, 1) = =
24 576
55 24
Since 6= , they are not equal.
576 576
Example for x = 1, y = 3:
13 7 91
pX (1) · pY (3) = · =
24 12 288
3 6 72
p(1, 3) = = =
12 24 288
91 72
Again, 6= .
288 288
Since there exists an x, y such that p(x, y) 6= pX (x) · pY (y), the random variables X and Y are not
independent.
Alternatively, the fact that the conditional pmf pY |X (y|x) is not equal to pY (y) confirms that X
and Y are not independent.
be the event that you get tails for the first n − 1 flips and then heads on flip n. What is PX (E) if
n = 4?
(Do not write your answer as a fraction. Instead, express it as a decimal number, rounded to two
decimal places if necessary. For example, write 0.33 instead of 1/3.)
Solution:
Answer: PX (E) = 0.0189
Explanation:
Since the P = Bernoulli(0.7) the pmf p is p(1) = 0.7 and p(0) = 0.3. Since each coin flip is
independent and identically distributed, the joint distribution PX can be written as the product of
the marginal distribution P:
PX (E) = PX (X1 = 0, X2 = 0, X3 = 0, X4 = 1)
= P(X1 = 0) · P(X2 = 0) · P(X3 = 0) · P(X4 = 1)
= p(X1 = 0) · p(X2 = 0) · p(X3 = 0) · p(X4 = 1)
= 0.3 × 0.3 × 0.3 × 0.7
= 0.0189
9/15
Fall 2024 CMPUT 267: Basics of Machine Learning
a. Yes, F
b. Yes, R
c. No, F
d. No, R
Solution:
Answer: a. Yes, F
Explanation:
g(X) is a function of a random variable, and it maps elements from X to F, where F consists of
functions from X to R of the form f (x) = xw, with w ∈ R. Since the output of g(X) is a function
in F, g(X) is indeed a random variable, and its outcome space is F.
b. σ 2
µ
c.
n
d. nµ
Solution:
Answer: a. µ
Explanation:
The expected value of the sample mean X̄ is:
" n # n
1X 1X 1
E[X̄] = E Xi = E[Xi ] = · nµ = µ.
n n n
i=1 i=1
10/15
Fall 2024 CMPUT 267: Basics of Machine Learning
Solution:
Answer: 3
Explanation:
Compute the expected value:
Solution:
Answer: E[C(X)] = 250
Explanation:
We are asked to compute the expected cost:
Z 10
3
E[C(X)] = E[X ] = x3 · p(x) dx
0
Given that X is uniformly distributed over [0, 10], the density function is:
1
p(x) = for 0 ≤ x ≤ 10
10
11/15
Fall 2024 CMPUT 267: Basics of Machine Learning
Let N be a random variable that takes values based on the roll of a biased four-sided die. The pmf
of N is:
n
pN (n) = for n ∈ {1, 2, 3, 4},
10
pN (n) = 0 otherwise.
What is E[N ]?
(Do not write your answer as a fraction. Instead, express it as a decimal number, rounded to two
decimal places if necessary. For example, write 0.33 instead of 1/3.)
Solution:
Answer: E[N ] = 3
Explanation:
First, list the probabilities:
1
pN (1) =
10
2
pN (2) =
10
3
pN (3) =
10
4
pN (4) =
10
12/15
Fall 2024 CMPUT 267: Basics of Machine Learning
Solution:
Answer: Var[N ] = 1
Explanation:
First, compute E[N 2 ]:
4
X
2
E[N ] = n2 · pN (n)
n=1
1 2 3 4
= (12 ) · + (22 ) · + (32 ) · + (42 ) ·
10 10 10 10
1 8 27 64
= + + +
10 10 10 10
1 + 8 + 27 + 64
=
10
100
=
10
= 10
13/15
Fall 2024 CMPUT 267: Basics of Machine Learning
What is the conditional expectation E[N |X = 1] (i.e., the expectation with respect to this condi-
tional pmf)?
(Do not write your answer as a fraction. Instead, express it as a decimal number, rounded to two
decimal places if necessary. For example, write 0.33 instead of 1/3.)
Solution:
20
Answer: E[N |X = 1] =
7
Explanation:
The conditional expectation E[N |X = 1] is computed as:
4
X
E[N |X = 1] = n · pN |X (n|1)
n=1
1 3 2 5
E[N |X = 1] = (1) · + (2) · + (3) · + (4) ·
7 14 7 14
1 6 6 20
= + + +
7 14 7 14
1 3 6 10
= + + +
7 7 7 7
1 + 3 + 6 + 10
=
7
20
=
7
a. σ 2
b. nσ 2
σ2
c.
n
d. n2 σ 2
Solution:
Answer: b. nσ 2
Explanation:
We are tasked with finding the variance of the sum of n independent and identically distributed
(i.i.d.) normal random variables X1 , X2 , . . . , Xn , each with mean µ and variance σ 2 .
14/15
Fall 2024 CMPUT 267: Basics of Machine Learning
15/15