Additional Exercice S Data Science
Additional Exercice S Data Science
1. The table below lists a dataset that was used to create a nearest neighbour model
that predicts whether it will be a good day to go surfing.
Assuming that the model uses Euclidean distance to find the nearest neighbour,
what prediction will the model return for each of the following query instances.
1
What target level would a nearest neighbor model using Euclidean distance
return for the following email: “machine learning for free”?
What target level would a k-NN model with k=3 and using Euclidean distance
return for the same query?
What target level would a weighted k-NN model with k=5 and using a weighting
scheme of the reciprocal of the squared Euclidean distance between the neighbor
and the query, return for the query?
What target level would a k-NN model with k=3 and using Manhattan distance
return for the same query?
There are a lot of zero entries in the spam bag-of-words dataset. This is indicative
of sparse data and is typical for text analytics. Cosine similarity is often a good
choice when dealing with sparse non-binary data. What target level would a 3-
NN model using cosine similarity return for the query?
3. You have been hired by the European Space Agency to build a model that predicts
the amount of oxygen that an astronaut consumes when performing five minutes of
intense physical work. The descriptive features for the model will be the age of the
astronaut and their average heart rate throughout the work. The regression model is
The table below shows a historical dataset that has been collected for this task.
2
Assuming that the current weights in a multivariate linear regression model
are w[0] = 59.50, w[1] = 0.15, and w[2]=0.60, make a prediction for each
training instance using this model.
Calculate the sum of squared errors for the set of predictions generated in the
previous question.
Assuming a learning rate of 0.000002, calculate the weights at the next
iteration of the gradient descent algorithm.
Calculate the sum of squared errors for a set of predictions generated using
the new set of weights calculated in the previous question.