HW01 - Math Recap
HW01 - Math Recap
The machine learning lecture relies on your knowledge of undergraduate mathematics, especially linear
algebra and probability theory. You should think of this homework as a test to see if you meet the
prerequisites for taking this course.
Homework
Reading
We strongly recommend that you review the following documents to refresh your knowledge. You should
already be familiar with most of their content from your previous studies.
Linear Algebra
Notation. We use the following notation in this lecture:
f (x, y, Z) = xT Ay + Bx − xT CZD − y T E T y + F .
What should be the dimensions (shapes) of the matrices A, B, C, D, E, F for the expression above to be a
valid mathematical expression?
PN PN
Problem 2: Let x ∈ RN , M ∈ RN ×N . Express the function f (x) = i=1 j=1 xi xj Mij using only
matrix-vector multiplications. Show your work and briefly explain your steps.
Upload a single PDF file with your homework solution to Moodle by 30.01.2024, 23:59. We recommend to typeset your solution
(using LaTex or Word), but handwritten solutions are also accepted (bring them to the next lecture or put it in my box).
Collaboration is fine but submitting the same or extremely similar solutions is not allowed. Homework rules are in the syllabus.
CS251/CS340 Machine Learning Page 2
Ax = b (1)
a) Under what conditions does the system of linear equations have a unique solution x for any choice
of b?
b) Assume that M = N = 4 and that A has the following eigenvalues: {−1, 0, 4, 4}. Does Equation 1
have a unique solution x for any choice of b? Justify your answer.
Problem 5: A symmetric matrix A ∈ RN ×N is positive semi-definite (PSD) if and only if for any x ∈
RN it holds that xT Ax ≥ 0. Prove that a symmetric matrix A is PSD if and only if it has no negative
eigenvalues.
Problem 6: Let A ∈ RM ×N . Prove that the matrix B = AT A is positive semi-definite for any A.
Calculus
Problem 7: Consider the following function f : R → R
1 2
f (x) = ax + bx + c
2
min f (x)
x∈R
a) Under what conditions does this optimization problem have (i) a unique solution, (ii) infinitely many
solutions or (iii) no solution? Justify your answer.
b) Assume that the optimization problem has a unique solution. Write down the closed-form expression
for x∗ that minimizes the objective function, i.e. find x∗ = arg minx∈R f (x).
1 T
g(x) = x Ax + bT x + c
2
min g(x)
x∈RN
a) Compute the Hessian ∇2 g(x) of the objective function. Under what conditions does this optimiza-
tion problem have a unique solution?
b) Why is it necessary for a matrix A to be PSD for the optimization problem to be well-defined?
What happens if A has a negative eigenvalue?
Upload a single PDF file with your homework solution to Moodle by 30.01.2024, 23:59. We recommend to typeset your solution
(using LaTex or Word), but handwritten solutions are also accepted (bring them to the next lecture or put it in my box).
Collaboration is fine but submitting the same or extremely similar solutions is not allowed. Homework rules are in the syllabus.
CS251/CS340 Machine Learning Page 3
c) Assume that the matrix A is positive definite (PD). Write down the closed-form expression for x∗
that minimizes the objective function, i.e. find x∗ = arg minx∈RN g(x).
Probability Theory
Notation. We use the following notation in our lecture:
• For conciseness and to avoid clutter, we use p(x) to denote multiple things
1. If X is a discrete random variable, p(x) denotes the probability mass function (PMF) of X at
point x (usually denoted as pX (x)orp(X = x) in the statistics literature).
2. If X is a continuous random variable, p(x) denotes the probability density function (PDF) of X
at point x (usually denoted as fX (x) in the statistics literature).
3. If A ∈ Ω is an event, p(A) denotes the probability of this event (usually denoted as P r({A}) or
P ({A}) in the statistics literature)
You will mostly encounter (1) and (2) throughout the lecture. Usually, the meaning is clear from the
context.
• Given the distribution p(x), we may be interested in computing the expected value Ep(x) [f (x)] or,
equivalently, EX [f (x)]. Usually, it is clear with respect to which distribution we are computing the
expectation, so we omit the subscript and simply write E[f (x)].
• x ∼ p means that x is distributed (sampled) according to the distribution p. For example, x ∼
N (µ, σ 2 ) (or equivalently p(x) = N (µ, σ 2 ) means that x is distributed according to the normal distri-
bution with mean µ and variance σ 2 .
Problem 10: Exponential families include many of the most common distributions, such as normal,
exponential, Bernoulli, categorical, etc. You are given the general form of the PDF (PMF in the discrete
case) pθ (x) (also written as p(x | θ)) of the distributions from the exponential family below:
" k
#
X
p(x | θ) = h(x) c(θ) exp wi (θ) ti (x) , θ ∈ Θ,
i=1
where Θ is the parameter space, h(x) ≥ 0, and the ti (x)-s only depend on x, and similarly, c(θ) ≥ 0 and
the wi (θ)-s only depend on the (possibly vector-valued) parameter θ.
Your task is to express the Binomial distribution as an exponential family distribution. Also express the
Beta distribution is an exponential family distribution. Show that the product of the Beta and the Bino-
mial distribution is also a member of the exponential family.
Upload a single PDF file with your homework solution to Moodle by 30.01.2024, 23:59. We recommend to typeset your solution
(using LaTex or Word), but handwritten solutions are also accepted (bring them to the next lecture or put it in my box).
Collaboration is fine but submitting the same or extremely similar solutions is not allowed. Homework rules are in the syllabus.
CS251/CS340 Machine Learning Page 4
Problem 12: Consider the following bivariate distribution p(x, y) of two discrete random variables X
and Y .
Compute:
Problem 13: You are given the joint PDF p(a, b, c) of three continuous random variables. Show how the
following expressions can be obtained using the rules of probability
1. p(a)
2. p(c | a, b)
3. p(b | c)
Problem 14: In this problem, there are two bowls. The first bowl holds three pineapples and three or-
anges, while the second bowl has three pineapples and five oranges. Additionally, there’s a biased coin,
which lands on ”tails” with a 0.7 probability and ”heads” with a 0.3 probability. If the coin lands on ”heads”,
a piece of fruit is randomly selected from the first bowl; if it lands on ”tails”, the fruit is chosen from the
second bowl. Your friend flips the coin (which you can’t see), selects a piece of fruit from the correspond-
ing bowl, and hands you a pineapple. Determine the likelihood that the pineapple was selected from the
second bowl.
Problem 15: (Iterated Expectations) Consider two random variables X, Y with joint distribution p(x, y).
Show that
EX [ x ] = EY [EX [ x | y ]].
Here, EX [x|y] denotes the expected value of x under the conditional distributionp(x|y).
Problem 16: Let X ∼ N (µ, σ 2 ), and f (x) = ax + bx2 + c. What is E[f (X)]?
Problem 17: Let p(x) = N (x|µ, Σ), and g(x) = Ax (where A ∈ RN ×N ). What are the values of the
following expressions:
• E[g(x)],
• E[g(x)g(x)T ],
• E[g(x)T g(x)],
• the covariance matrix Cov[g(x)].
Upload a single PDF file with your homework solution to Moodle by 30.01.2024, 23:59. We recommend to typeset your solution
(using LaTex or Word), but handwritten solutions are also accepted (bring them to the next lecture or put it in my box).
Collaboration is fine but submitting the same or extremely similar solutions is not allowed. Homework rules are in the syllabus.