ML Hw1
ML Hw1
Instructions:
Grading Criteria:
(a) Explain why training a machine learning model can be formulated as an optimization
problem. What are the objectives and constraints involved?
(b) Provide examples of how optimization techniques are applied in the training of models
such as linear regression and logistic regression.
(c) Discuss the role of the loss (or cost) function in this context and how it guides the
optimization process.
(a) Derive the Maximum Likelihood Estimator (MLE) for the mean µ.
(b) Assume a prior distribution for µ that is also normally distributed with mean µ0 and
variance τ 2 . Derive the Maximum A Posteriori (MAP) estimator for µ.
(c) Compare the MLE and MAP estimators. Discuss how the choice of µ0 and τ 2 affects the
MAP estimator.
1
Word Sports Count Politics Count
win 50 10
team 60 5
election 15 70
vote 10 80
(a) Explain the Naive Bayes assumption and how it applies to text classification.
(b) Using the data above, calculate the probability that a document containing the words win
and vote belongs to the Sports category versus the Politics category. Assume uniform
class priors and apply Laplace smoothing with α = 1.
(c) Interpret the results and discuss any limitations of the Naive Bayes classifier in this
context.
(a) Fit a linear regression model to the data and report the estimated parameters.
(c) Compare the training error and discuss which model is likely overfitting the data. Provide
visualizations to support your answer.
(a) Explain the difference between L1 (Lasso) and L2 (Ridge) regularization in the context
of linear regression.
(b) Given a dataset with multiple features are highly correlated, discuss which regularization
method would be more appropriate and why.