ML Unit - 3
ML Unit - 3
Statistical Learning:
Bayesian Reasoning
After collecting data, Bayesian inference involves updating the prior beliefs
using Bayes' theorem to obtain the posterior probabilities. Bayes' theorem
mathematically combines the prior probability, likelihood of the observed data
given the hypothesis, and the probability of the data. The posterior probability
represents the updated belief or probability of the hypothesis or parameter after
considering the observed evidence.
1. Training Phase: During the training phase, the k-NN classifier stores the
feature vectors and corresponding labels of the training instances. The feature
vectors represent the attributes or characteristics of the data points, and the
labels indicate their respective classes or categories.
2. Distance Metric: The choice of a distance metric is crucial in the k-NN
classifier. Common distance metrics include Euclidean distance, Manhattan
distance, and Minkowski distance. The distance metric determines how "close"
or similar two data points are in the feature space.
3. Prediction Phase: When making a prediction for a new, unseen data point,
the k-NN classifier calculates the distances between the new point and all the
The k-NN classifier is a versatile algorithm that is particularly useful when there
is limited prior knowledge about the data distribution or when decision
boundaries are complex. It serves as a baseline algorithm in many classification
tasks and provides a simple yet effective approach to classification based on the
neighbors' similarity.
Regression functions aim to find the best-fitting curve or surface that minimizes
the discrepancy between the predicted values and the actual values of the
response variable. They can be used for prediction, estimation, and
understanding the relationship between variables.
Linear regression with the least squares error criterion is a commonly used
method for fitting a linear relationship between a dependent variable and one or
more independent variables. It aims to find the best-fitting line or hyperplane
that minimizes the sum of squared differences between the observed values and
the predicted values.
Here's how the linear regression with the least squares error criterion works:
where:
where:
Linear regression with the least squares error criterion is widely used due to its
simplicity and interpretability. It provides a linear relationship between the
independent variables and the dependent variable, allowing for understanding
the direction and magnitude of the relationships. However, it assumes linearity
and requires the independence and normality assumptions to hold for reliable
results.
P(y=1 | x) = 1 / (1 + e^(-z))
where:
P(y=1 | x) is the probability of the positive class given the input features
x,
z is the linear combination of the input features and their corresponding
coefficients:
z = b0 + b1x1 + b2x2 + ... + bn*xn
b0, b1, b2, ..., bn are the coefficients or weights corresponding to the
independent variables x1, x2, ..., xn.
2. Logistic Function: The logistic function transforms the linear
combination of the input features and coefficients into a value between 0 and 1.
It introduces non-linearity and allows for modeling the relationship between the
features and the probability of the positive class.
3. Estimation of Coefficients: The coefficients (weights) in logistic
regression are estimated using maximum likelihood estimation (MLE) or
optimization algorithms such as gradient descent. The objective is to find the
optimal set of coefficients that maximize the likelihood of the observed data or
minimize the log loss, which measures the discrepancy between the predicted
probabilities and the true class labels.
4. Decision Threshold: To make predictions, a decision threshold is applied
to the predicted probabilities. Typically, a threshold of 0.5 is used, where
probabilities greater than or equal to 0.5 are classified as the positive class, and
probabilities less than 0.5 are classified as the negative class. The decision
The MDL principle balances the complexity of the model with its ability to
accurately describe and compress the observed data. It provides a criterion for
selecting the most parsimonious and informative model, avoiding both
overfitting and underfitting.
UNIT-IV
The basic idea behind SVM is to find a hyperplane that best separates the data
points of different classes. A hyperplane in this context is a higher-dimensional
analogue of a line in 2D or a plane in 3D. The hyperplane should maximize the
margin between the closest data points of different classes, called support