0% found this document useful (0 votes)
4 views

HW2P

Uploaded by

luisliu1201
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

HW2P

Uploaded by

luisliu1201
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

HW2P

October 23, 2024

1 Assignment 2: Bayesian Sequential Estimation and K-means


This assignment covers: Bayesian Sequential Estimation and K-means
There are hidden tests within this assignment to grade the accuracy of your work.
PLEASE DO NOT CHANGE THE NAME OF THIS FILE OR ADD/DELETE THE
CELLS.
Please run the following cell to import the appropriate libraries:
[1]: import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import beta

1.1 Q1: Bayesian Sequential Estimation and the Effects of Priors [8 points]
In this problem, we will consider the problem of sequential Bayesian estimation of a coin’s bias-
weighting for heads. Beginning with a Beta (Bishop Equation 2.13) prior distribution of the bias
𝜇, you will simulate observing successive coin flips, and plot the posterior distrbution of the bias
𝜇 as determined by applying Bayes’ Theorem (Bishop Equation 1.43). This material is covered in
Bishop, pages 71-74
To explore the effect that different intial prior distributions have, you will perform this sequential
estimation process for three initial priors in parallel:
• (a = 1, b = 1)
• (a = 0.5, b = 0.5)
• (a = 50, b = 50)
You will simulate flipping a coin with a 1/4 probability of coming up heads on each toss. This
question is divided into two parts. The first will deal with the behavior of the three distributions
for the first few sample flips. The second will deal with the behavior of the three distributions as
we see several thousand flips. For each part, you will create a collection of subplots to visualize
evolution of the posterior distributions over time.
First, in the cell below, we define helper function plotbetapdfs, which is used to plot the Beta
distributions for our prior and posterior estimates.

1
[2]: def plotbetapdfs(ab, sp_idx, tally):
"""
Inputs:
ab: 3-by-2 matrix containing the a,b parameters for the priors/posteriors
Initial entries in the matrix:
ab[0,:]=[1 1];
ab[1,:]=[.5 .5];
ab[2,:]=[50 50];

sp_idx: 3-element array that specfies in which subplot to plot the


current distributions specified by the (a,b) pairs in ab.

tally: 2-element array (# heads, # tails) containing a running count


of the observed number of heads and tails.
Initial entry in the array:
tally=[0 0]
"""

num_rows = ab.shape[0]
xs = np.arange(.001, 1, .001)

plt.subplots_adjust(left=0, right=1, bottom=0, top=2, wspace=0.5, hspace=0.


↪5)
plt.rc('text', usetex=True)
plt.rc('font', family='serif')
plt.subplot(sp_idx[0], sp_idx[1], sp_idx[2])

mark = ['-',':','--']

for row in range(num_rows):


a = ab[row,0]
b = ab[row,1]
marker = mark[row]

vals = beta.pdf(xs, a, b)
plt.plot(xs, vals, marker)
axes = plt.gca()

axes.set_xlim([0, 1])
axes.set_ylim([0, 20])
plt.title('{:d} h, {:d} t'.format(*tally))
plt.xlabel(r'Bias weighting for heads $\mu$')
plt.ylabel(r'$p(\mu|\{data\},I)$')
plt.legend([r'$\alpha$={:g}, $\beta$={:g}'.format(a,b) for a,b in ab],␣
↪loc="upper right", prop={'size': 6})

2
1.1.1 Part 1: Behavior for first 5 flips [5 points]
In this part, you will perform a total of 5 flips. You will produce a 3-by-2 matrix of subplots, one
plot each for the prior and for the state after each flip.
First:
1. Intialize 3-by-2 matrix ab which first contains the initial hyperparameters of the beta distri-
butions as a numpy array
2. Initalize numpy array tally to store counts of heads and tails as [#heads, #tails]. HINT
Initially there are no heads or tails counted
[3]: #initializing initial hyperparameters ab as a matrix
ab = np.array(...)

#creating tally variable to count # of heads and tails


tally = np.array(...)

### BEGIN SOLUTION


ab = np.array([[1, 1], [0.5, 0.5], [50, 50]])
tally = np.array([0, 0])
### END SOLUTION

#initial subplot index


sp_idx = (3, 2, 1)

[4]: assert ab.shape == (3, 2)


assert tally.shape == (2,)
### BEGIN HIDDEN TESTS
assert np.allclose(ab, [[1, 1], [0.5, 0.5], [50, 50]])
assert np.array_equal(tally, [0, 0])
### END HIDDEN TESTS

Now, using plotbetapdfs, plot the intial priors.


[5]: ### BEGIN SOLUTION
plotbetapdfs(ab, sp_idx, tally)
### END SOLUTION

3
Now we will simulate 5 flips with bias of 𝜇 = 0.25 and update the tally. To do this we must:
1. Initialize probability matrix p to contain the probabilities of heads and tails as [P(heads),
P(tails)].
2. Initialize flips_tally as a 5 by 2 numpy array of zeroes to store the CUMULATIVE
heads/tails counts at each flip.
3. Simulate 5 coin flips using np.random.choice and update flips_tally accordingly.
[6]: # Intialize probability distrbution
p = ...

# Reintialize tally as 5 x 2 numpy array of zeroes.


# HINT: use np.zeroes with dtype int
flips_tally = ...

### BEGIN SOLUTION


p = [0.25, 0.75]
flips_tally = np.zeros((5, 2), dtype=int)
### END SOLUTION

[7]: assert sum(p) == 1


assert flips_tally.shape == (5, 2)
### BEGIN HIDDEN TESTS
assert np.allclose([0.25,0.75], p)
assert np.all(flips_tally == 0)
### END HIDDEN TESTS

4
Run the loop below ONCE to simulate 5 coinflips and print updated flips_tally.
[8]: # Simulate 5 coin flips while updating flips_tally.
np.random.seed(1)
for i in range(5):
coin_flip_outcome = np.random.choice([1, 0], p=p)

if coin_flip_outcome == 1:
flips_tally[i, 0] = 1
else:
flips_tally[i, 1] = 1

flips_tally[i] += flips_tally[i-1]
# Print flips_tally after coinflips

print(flips_tally)

[[0 1]
[0 2]
[1 2]
[1 3]
[2 3]]
Now complete the loop below to update and plot the prior distributions for each flip. After each
subsequent flip, update the posterior ab in the variable updated_ab:
[9]: sp_idx

[9]: (3, 2, 1)

[10]: # Given: replot initial priors


plotbetapdfs(ab, sp_idx, tally)

# For each flip, update ab using bayesian rule, then use plotbetapdfs with␣
↪sp_index to plot distrbutions:

for i in range(1, 6):


sp_idx = (3, 2, i + 1)
updated_ab = ...
#plotbetapdfs(...)

### BEGIN SOLUTION


updated_ab = ab + np.expand_dims(np.array(flips_tally[i-1]), axis=0)
plotbetapdfs(updated_ab, sp_idx, flips_tally[i-1])
### END SOLUTION
print(updated_ab)

[[ 1. 2. ]
[ 0.5 1.5]
[50. 51. ]]

5
[[ 1. 3. ]
[ 0.5 2.5]
[50. 52. ]]
[[ 2. 3. ]
[ 1.5 2.5]
[51. 52. ]]
[[ 2. 4. ]
[ 1.5 3.5]
[51. 53. ]]
[[ 3. 4. ]
[ 2.5 3.5]
[52. 53. ]]

6
7
[11]: assert updated_ab.shape == (3, 2)
### BEGIN HIDDEN TESTS
assert (updated_ab == np.array([[ 3. , 4. ], [ 2.5 , 3.5], [52. , 53. ]])).
↪all()

### END HIDDEN TESTS

Your plots should look something like this:

1.1.2 Part 2: Long Term Behavior of Posterior Distributions [4 Points]


In this part, you will simulate 2048 coin flips. You will produce a 4-by-3 matrix of subplots which
visualize the evolution of the posterior distrbution. Like part one, the first subplot will be of the
prior distrbutions before any flips. The rest will represent the posterior distribution when the
number of flips are the following powers of 2 ∶ 2𝑖 , 𝑖 = 1, ..., 11.
To do this, you will:
1. Plot the initial prior distributions using plotbetapdfs by resetting ab and tally to their
initial values. HINT: reuse the same code from part one.
2. Simulate the next 2048 coin flips, updating the distributions and plotting them every time
the number of flips is a power of 2. You are provided with a loop that does so, but you must:
• Define list intervals which contains all powers of 2 up to 2048: [2^1, … 2^11].
• Reintialize flips_tally as a numpy array of appropriate size to store cumulative heads/tail
counts at each interval. It should contain only zeros. Use np.zeros again.
[12]: # initializing initial hyperparameters ab as a matrix
ab = np.array(...)

# creating tally variable to count # of heads and tails


tally = np.array(...)

# Initial subplot index


sp_idx = (4, 3, 1)

# Plot initial prior distributions using plotbetapdfs with the above sp_idx.␣
↪Fill out and uncomment the below line

# plotbetapdfs(...)

# Initialize intervals as a list of powers of 2: 2^i for i = 1, ..., 11


intervals = ...

# Create flips_tally as a zero'd numpy array of size len(intervals) x 2 to␣


↪count # of heads and tails

flips_tally = ...

### BEGIN SOLUTION


ab = np.array([[1, 1], [0.5, 0.5], [50, 50]])

8
tally = np.array([0, 0])

plotbetapdfs(ab, sp_idx, tally)

intervals = [2**i for i in range(1, 12)]


flips_tally = np.zeros((len(intervals), 2), dtype=int)
### END SOLUTION

# Simulate coinflips
np.random.seed(1)
for i in range(len(intervals)):
# calculate number of flips as difference between successive elements in␣
↪intervals

num_flips = intervals[i] if i == 0 else intervals[i] - intervals[i-1]

# simulate num_flips coin flips and accumulate num heads and tails
flips = np.random.choice([1, 0], size=num_flips, p=p)
heads = np.sum(flips)
tails = num_flips - heads

# update flips_tally with those values


flips_tally[i, 0] = heads
flips_tally[i, 1] = tails

# ensure flips_tally is cumulative


if i > 0:
flips_tally[i] += flips_tally[i-1]

# Update distrbutions and plot


for i in range(1, len(intervals) + 1):
sp_idx = (4, 3, i + 1)

# update ab
updated_ab = ab + np.expand_dims(np.array(flips_tally[i-1]), axis=0)

# plot
plotbetapdfs(updated_ab, sp_idx, flips_tally[i-1])

9
10
[13]: assert intervals[0] == 2 and intervals[-1] == 2048 and len(intervals) == 11 and␣
↪sum(intervals) == 4094

assert len(flips_tally) == len(intervals)


### BEGIN HIDDEN TESTS
assert intervals == [2**i for i in range(1, 12)]
assert (flips_tally == [[0,2], [1,3], [4,4], [6,10], [12, 20], [20, 44], [36,␣
↪92], [71, 185], [131, 381], [256,768], [502,1546]]).all()

### END HIDDEN TESTS

Your plots should look something like this:

1.2 Multiple Choice Section [4 Points]


1.2.1 Think about what the plots of posterior parameter distributions for the first
5 flips tell you about the difference between the Beta(a=50, b= 50) prior and
the other two priors.
It may help to think about the interpretation of the prior parameters as “fake” data as per Bishop
page 72.
1. Which prior represents having more “fake” data? [0.8 pts]
A. Beta(a=50,b = 50)
B. Beta(a= 1 ,b = 1 )

[14]: ans_1 = ... # 'A', 'B'


### BEGIN SOLUTION
ans_1 = "A"
### END SOLUTION

[15]: ### BEGIN HIDDEN TESTS


assert ans_1 == "A"
### END HIDDEN TESTS

2. What does having more “fake” data mean? [0.8 pts]


A. It is a “stronger” prior, meaning it takes more data to move away from the prior
B. It is a “weaker” prior, meaning it takes less data to move away from the prior
[16]: ans_2 = ... # 'A', 'B'
### BEGIN SOLUTION
ans_2 = "A"
###END SOLUTION

[17]: ### BEGIN HIDDEN TESTS


assert ans_2 == "A"
### END HIDDEN TESTS

11
3. What does having a=b in our priors mean? [0.8 pts]
A. I don’t know
B. It means that we do not believe that heads or tails are more likely than each other
[18]: ans_3 = ... # 'A', 'B'
### BEGIN SOLUTION
ans_3 = "B"
### END SOLUTION

[19]: ### BEGIN HIDDEN TESTS


assert ans_3 == "B"
### END HIDDEN TESTS

4. If we believed that heads was more likely than tails, which should you set? [0.8 pts]
A. a > b
B. a < b
C. a = b
[20]: ans_4 = ... # 'A', 'B', 'C'
### BEGIN SOLUTION
ans_4 = "A"
### END SOLUTION

[21]: ### BEGIN HIDDEN TESTS


assert ans_4 == "A"
### END HIDDEN TESTS

5. Why do the plots of the posterior distributions after several thousand flips (for the different
priors) look so similar? [0.8 pts]
A. Because with lots of data, the prior becomes less important
B. Because we made a coding error
[22]: ans_5 = ... # 'A', 'B'
### BEGIN SOLUTION
ans_5 = "A"
### END SOLUTION

[23]: ### BEGIN BHIDDEN TESTS


assert ans_5 == "A"
### END HIDDEN TESTS

1.3 Q2: Implement K-means [7.5 points]


In this problem you will implement the K-means algorithm as covered in Bishop pages 424-425.
You are given a main function runKMeans, a plotting helper function plotCurrent, and the data

12
file scaledfaithful.txt. Your task is to provide code for the functions called by runKMeans: -
calcSqDistances (Possibly helpful functions: np.dot, np.sum) - determineRnk (Possibly help-
ful functions: np.argmin, np.eye, .shape) - recalcMus (Possibly helpful functions: np.dot,
np.divide, np.sum)
Note: If you find that you have written much more than 10 lines for any of the above functions,
you should try to rethink your approach.
[24]: # Defining the plotting helper function
def plotCurrent(X, Rnk, Kmus):
N, D = np.shape(X)
K = np.shape(Kmus)[0]

InitColorMat = np.matrix([[1, 0, 0],


[0, 1, 0],
[0, 0, 1],
[0, 0, 0],
[1, 1, 0],
[1, 0, 1],
[0, 1, 1]])

KColorMat = InitColorMat[0:K]
colorVec = Rnk.dot(KColorMat)
muColorVec = np.eye(K).dot(KColorMat)

plt.scatter(X[:,0], X[:,1], edgecolors=colorVec, marker='o',␣


↪facecolors='none', alpha=0.3)
plt.scatter(Kmus[:,0], Kmus[:,1], c=muColorVec, marker='D', s=50)

[25]: def calcSqDistances(X, Kmus):


### BEGIN SOLUTION
return ((-2 * X.dot(Kmus.T) + np.sum(np.multiply(Kmus,Kmus), axis=1).T).T +␣
↪np.sum(np.multiply(X,X), axis=1)).T

### END SOLUTION

[26]: def determineRnk(sqDmat):


### BEGIN SOLUTION
m = np.argmin(sqDmat, axis=1)
return np.eye(sqDmat.shape[1])[m]
### END SOLUTION

[27]: def recalcMus(X, Rnk):


### BEGIN SOLUTION
return (np.divide(X.T.dot(Rnk), np.sum(Rnk, axis=0))).T
### END SOLUTION

13
[28]: def runKMeans(K,fileString):
fig = plt.gcf()

#load data file specified by fileString from Bishop book


X = np.loadtxt(fileString, dtype='float')

#determine and store data set information


N, D = X.shape

#allocate space for the K mu vectors


Kmus = np.zeros((K, D))

#initialize cluster centers by randomly picking points from the data


rand_inds = np.random.permutation(N)
Kmus = X[rand_inds[0:K],:]

#specify the maximum number of iterations to allow


maxiters = 1000

for iter in range(maxiters):


#assign each data vector to closest mu vector as per Bishop (9.2)
#do this by first calculating a squared distance matrixw here the n,k␣
↪entry

#contains the squared distance from the nth data vector to the kth mu␣
↪vector

#sqDmat will be an N-by-K matrix with the n,k entry as specfied above
sqDmat = calcSqDistances(X, Kmus)

#given the matrix of squared distances, determine the closest cluster


#center for each data vector

#R is the "responsibility" matrix


#R will be an N-by-K matrix of binary values whose n,k entry is set as
#per Bishop (9.2)
#Specifically, the n,k entry is 1 if point n is closest to cluster k,
#and is 0 otherwise
Rnk = determineRnk(sqDmat)

KmusOld = Kmus
plotCurrent(X, Rnk, Kmus)
plt.show()

# Recalculate mu values based on cluster assignments as per Bishop (9.4)


Kmus = recalcMus(X, Rnk)

# Check to see if the cluster centers have converged. If so, break.

14
if sum(abs(KmusOld.flatten() - Kmus.flatten())) < 1e-6:
break

plotCurrent(X,Rnk,Kmus)
return Kmus

Run runKMeans to verify the functionality


[29]: np.random.seed(2)
Kmus = runKMeans(4, 'scaledfaithful.txt')

15
16
17
18
[30]: assert Kmus.shape == (4,2)

[31]: assert np.abs(Kmus[0,0] - 0.45345097) < 1e-6 and np.abs(Kmus[0,1] - 0.23322868)␣


↪< 1e-6

[32]: assert np.abs(Kmus[1,0] - 0.52508499) < 1e-6 and np.abs(Kmus[1,1] - 1.09253179)␣


↪< 1e-6

[33]: ### BEGIN HIDDEN TESTS


assert np.abs(Kmus[2,0] - 0.99265969) < 1e-6 and np.abs(Kmus[2,1] - 0.78378766)␣
↪< 1e-6

assert np.abs(Kmus[3,0] - (-1.27009425)) < 1e-6 and np.abs(Kmus[3,1] - (-1.


↪20649098)) < 1e-6

### END HIDDEN TESTS

1.4 The End of HW2P!


This is the end of the HW2P.
Have a look back over your answers, and also make sure to Restart Kernel And Run All Cells
from the kernel menu to double check that everything is working properly. This restarts everything
and runs your code from top to bottom.
Once you’re happy with your work, click the disk icon to save, and submit onto DataHub. You
MUST submit all the required component to receive credit.
Note that you can submit at any time, but we grade your most recent submission. This means
that if you submit an updated notebook after the submission deadline, it will be marked
as late.

19

You might also like