Assignment 1: Statistical Machine Learning, Summer Term 2022
Assignment 1: Statistical Machine Learning, Summer Term 2022
On this sheet we are going to recap the basic maths that will be needed to follow this course
(note that the contents of “Maths for ML” are a prerequisite to the course) and provide the
instructions to install Python. There are basic recap documents and links on the course webpage,
recap slides at the end of the slides of this class, and youtube videos in this playlist: https:
//www.youtube.com/playlist?list=PL05umP7R6ij1a6KdEy8PVE9zoCv6SlHRS
Y
X Small Medium Large
Male 0.1 0.15 0.25
Female 0.3 0.1 0.1
(a) The marginal probabilities refer to the probability of only one variable, P (X = x) or in
short P (x). Compute the marginal probabilities of X and Y. We have that P (X = Male) +
P (X = Female) = 1, why?
(b) Calculate the expected value of X, E(X). Let xi for i = 1, . . . , n be i.i.d samples from a
distribution X. Then the empirical mean is defined as
n
1X
X̄n = xi
n i=1
The empirical mean is an estimate for the expected value. In particular the weak law of
large numbers holds. It states that for all ε > 0
lim P X̄n − E (X) > ε = 0
n→∞
Assuming that both E(X) and Var(X) are finite, prove the weak law of large numbers. You
can use Chebyshev’s inequality. It states that if Xi are n random variables i.i.d. as X, with
expected value E(X) and variance Var(X), then for every ε > 0 it holds that
Var(X)
P (|X − E (X)| ≥ ε) ≤ .
ε2
You can also use the following facts. For all ai ∈ R,
!
X X
E ai Xi = ai E (Xi )
i i
and !
X X
V ar ai Xi = a2i Var (Xi ) .
i i
(a) Conditional probabilities refer to the probability distribution of one variable given another
one. It is denoted by
P (A ∩ B) P (A, B)
P (A | B) = = ,
P (B) P (B)
which reads as the probability of A given B. For example in Exercise 1 we have that
P (Y = Large | X = Male) = 0.5. Calculate the probability P (Y = Medium | X = Female).
(b) When are two random variables X and Y independent? Name two characterizations.
(c) The Bayes theorem states that
P (A = a | B = b)P (B = b)
P (B = b | A = a) =
P (A = a)
Let A be the test result for cancer screening, it can be negative or positive, and let B indicate
whether the tested patient has cancer or not. The probability of having cancer is 1% and the
test is accurate with 95% probability, which for this exercise means that
P (B = cancer | A = positive).
Are you surprised by the result? Can you give an informal explanation of why we obtain such
result?
(d) The odds of having cancer are given by
P (B = cancer)
O(B = cancer) = .
P (B = no cancer)
This quantity states how many cancer patients you have to expect per person without the
disease. The Bayes factor is given by P P(A=positive|B=no
(A=positive|B=cancer)
cancer) and states how much more
likely it is to get a positive test result given a person has cancer compared to when it has no
cancer. Can you state the updated odds after a positive test result
P (B = cancer|A = positive)
O(B = cancer | A = positive) =
P (B = no cancer|A = positive)
in terms of O(B = cancer) and the Bayes factor? Why is this view valuable?
(a) Note that the product Ax for any arbitrary matrix A ∈ R3×3 and any x ∈ R3 can always be
written as a linear combination of the column vectors of A with the elements of x as coefficients.
Let xT = (x1 , x2 , x3 ) ∈ R3 . Write down the explicit form of Ax as a linear combination of the
column vectors of A.
(b) Do the columns of A form a basis of R3 ? Answer the same question for the rows of A.
2
(c) Now consider the system of linear equations Ax = b; where x ∈ R3 , b ∈ R3 . Try to find a b̃ ∈ R3
(if such a b̃ exists) so that there is no real valued solution x to the linear system Ax = b̃. If
such a b̃ does not exist then explain why. Try to understand the answer to this question in
terms of the expression in part (a) of the exercise.
Now consider bT = (2, 3, 12). Find x that satisfies the relation Ax = b.
(d) Just for the sake of completeness, what is the column rank, row rank and rank of the matrix
A? Write one line justifying/explaining your answers.
1 0 1 2 1 1
A= , B= , C= .
0 1 2 1 0 1
You can find a java applet to visualize the eigenvectors for 2x2 matrices in this link.
https://www.geogebra.org/m/KuMAuEnd. Enable java in your browser in order to access the
applet. Using the applet scale and rotate the vector x in order to identify the independent
eigenvectors and eigenvalues of the three matrices A, B, C. Explain briefly what you observe
in the case of matrix C.
(b) A ∈ Rn×n is a symmetric matrix, with a set of eigenvectors u1 , . . . , un with corresponding
eigenvalues λ1 , . . . , λn . Derive the eigenvectors and the eigenvalues of the following matrices in
terms of eigenvectors and eigenvalues of A.
(1) A + αI, where I is the identity matrix of size n and α ∈ R
(2) AT A
(3) AAT
(4) If in addition A is a non-singular matrix, then find the eigenvectors and eigenvalues of
A−1 .
(c) Let S ∈ Rm×n with m ̸= n. Identify the components of the singular value decomposition
(SVD) of S given that we have the eigendecomposition of the square symmetric matrices S T S
and SS T .
(d) To have a better understanding of eigenvalues/vectors and SVD we recommend the following
video https://www.youtube.com/watch?v=PFDu9oVAE-g. The whole series is worth watching
to gain a better geometric understanding of linear algebra.
During this course you will be required to implement some of the algorithms presented in class.
We will use Python and in particular Jupiter notebooks. In order to save time you should install
Python and all the required packages on your laptop. Here you will find the instruction of how to
do so. We will present two methods, the first one is the easiest and recommended if you do not
have any previous experience with Python. Second one is more suitable for those who know pip.
Installation
1) Anaconda. All you need to do is to follow the instructions that you find at the following
links. Select the correct one for your operating system and follow the instructions. When you need
to decide what to download please download ANACONDA and not MINICONDA. Furthermore
download the “Python 3.X version” NOT the “Python 2.X version”.
• Windows: https://conda.io/projects/conda/en/latest/user-guide/install/windows.
html
• MacOS: https://conda.io/projects/conda/en/latest/user-guide/install/macos.html
• Linux: https://conda.io/projects/conda/en/latest/user-guide/install/linux.html
3
2) Pip. We will use the following packages: numpy, scikit-learn, pandas, matplotlib, jupyter.
Test
Now it is time to see if everything we need is installed. Together with this sheet you should have
a file named Assignment 1.ipynb. We will use it to test that everything is correctly installed.
First thing we need to launch Jupyter. This depends on your operating system
• MacOS/Linux: Open a terminal in the folder that contains the Assignment 1.ipnyb file and
run jupyter notebook
Once Jupyter is running, navigate your folder structure until you find the Assignment 1.ipnyb
file and click on it. Once it is open, please click on cells → Run all. If it says that you are ready
to go then you are ready to go. Otherwise ask for help at the tutorial.