ML Short
ML Short
Machine Learning is a subset of artificial intelligence (AI) that involves the use of
algorithms and statistical models to enable computers to perform tasks without explicit
instructions. Instead, systems learn from data to identify patterns and make decisions. This
process often includes steps like data collection, preprocessing, model training, evaluation,
and deployment.
Supervised Learning is a type of machine learning where the model is trained using labeled
data. This means that each training example is paired with an output label. The goal is for the
model to learn the relationship between input data and the corresponding output labels to
make predictions on new, unseen data.
• Supervised Learning:
o Uses labeled data.
o The goal is to predict outcomes based on input data.
o Common tasks include classification and regression.
• Unsupervised Learning:
o Uses unlabeled data.
o The goal is to find hidden patterns or intrinsic structures in the input data.
o Common tasks include clustering and dimensionality reduction.
Bayes Theorem describes the probability of an event based on prior knowledge of conditions
that might be related to the event. The formula is:
Where:
Weight Space in the context of machine learning refers to the multidimensional space that
represents all possible values of the parameters (weights) of a model. Each point in this space
corresponds to a specific configuration of the model's weights.
• Overfitting: Occurs when a model learns the training data too well, capturing noise
and details that do not generalize to new data. This results in high accuracy on the
training set but poor performance on the test set.
• Underfitting: Happens when a model is too simple to capture the underlying
structure of the data, leading to poor performance on both training and test sets.
• Precision: The ratio of correctly predicted positive observations to the total predicted
positives. Precision=TPTP+FP\text{Precision} = \frac{TP}{TP +
FP}Precision=TP+FPTP
• Recall (Sensitivity): The ratio of correctly predicted positive observations to the
actual positives. Recall=TPTP+FN\text{Recall} = \frac{TP}{TP +
FN}Recall=TP+FNTP
• Accuracy: The ratio of correctly predicted observations to the total observations.
Accuracy=TP+TNTP+TN+FP+FN\text{Accuracy} = \frac{TP + TN}{TP + TN + FP
+ FN}Accuracy=TP+TN+FP+FNTP+TN
2/2
4o
Unit 2 Short answer Questions: 1. Define Linear Regression. 2. Compare Ride Regression,
Lasso Regression. 3. Define Decision Tree. 4. Describe multipe Linear Regression. 5. Define
Bayesian Liner regression. 6. Explain Laplace approximation. 7. Define Probabilistic
Generative models. 24 8. Describe random forest model. 9. Define nearest neighbour? 10.
Define logistic regression
ChatGPT
• Ridge Regression:
o Also known as Tikhonov regularization.
o Adds a penalty equal to the sum of the squared values of the coefficients to the
loss function.
o Helps to prevent overfitting by shrinking the coefficients.
o Formula: Loss Function=RSS+λ∑j=1nβj2\text{Loss Function} = \text{RSS}
+ \lambda \sum_{j=1}^n \beta_j^2Loss Function=RSS+λ∑j=1nβj2
• Lasso Regression:
o Stands for Least Absolute Shrinkage and Selection Operator.
o Adds a penalty equal to the sum of the absolute values of the coefficients to
the loss function.
o Can shrink some coefficients to zero, effectively performing variable
selection.
o Formula: Loss Function=RSS+λ∑j=1n∣βj∣\text{Loss Function} = \text{RSS}
+ \lambda \sum_{j=1}^n |\beta_j|Loss Function=RSS+λ∑j=1n∣βj∣
3. Define Decision Tree
Decision Tree is a non-parametric supervised learning method used for classification and
regression. It splits the data into subsets based on the value of input features. Each internal
node represents a "test" on an attribute, each branch represents the outcome of the test, and
each leaf node represents a class label or continuous value.
Multiple Linear Regression is an extension of simple linear regression that models the
relationship between a dependent variable and multiple independent variables. The model
aims to find the linear relationship: y=β0+β1x1+β2x2+⋯+βnxn+ϵy = \beta_0 + \beta_1 x_1 +
\beta_2 x_2 + \cdots + \beta_n x_n + \epsilony=β0+β1x1+β2x2+⋯+βnxn+ϵ where yyy is the
dependent variable, x1,x2,…,xnx_1, x_2, \ldots, x_nx1,x2,…,xn are the independent
variables, β0,β1,…,βn\beta_0, \beta_1, \ldots, \beta_nβ0,β1,…,βn are the coefficients, and
ϵ\epsilonϵ is the error term.
Bayesian Linear Regression incorporates prior distributions for the model parameters and
updates these priors with data to form posterior distributions. It provides a probabilistic
framework for linear regression, which allows for uncertainty in the predictions and model
parameters.
Probabilistic Generative Models are models that assume a probabilistic process for
generating the observed data. These models specify the joint probability distribution of the
input features and the output labels and can generate new samples from the learned
distribution. Examples include Gaussian Mixture Models and Hidden Markov Models.
Random Forest is an ensemble learning method for classification and regression that builds
multiple decision trees and merges them together to get a more accurate and stable prediction.
It works by averaging the predictions of individual trees (for regression) or taking the
majority vote (for classification).
Logistic Regression is a statistical method for modeling the probability of a binary outcome
based on one or more predictor variables. It uses the logistic function to model a binary
dependent variable: P(y=1∣x)=11+e−(β0+β1x1+β2x2+⋯+βnxn)P(y=1|x) = \frac{1}{1 + e^{-
(\beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_n x_n)}}P(y=1∣x)=1+e−(β0+β1x1+β2
x2+⋯+βnxn)1 where P(y=1∣x)P(y=1|x)P(y=1∣x) is the probability that the dependent variable
yyy equals 1 given the predictors x1,x2,…,xnx_1, x_2, \ldots, x_nx1,x2,…,xn.
4o
Unit 3 Short answer Questions: 1. Define Clustering. 2. Define hard and soft clustering. 3.
What is hierarchal clustering? 4. Describe Spectral Clustering. 5. What is dimensionality
reduction? 6. Define Linear Discriminant Analysis. 7. Define Principal Component analysis.
8. Explain Distance metrics used in clustering algorithms. 9. Discuss Topic modelling. 10.
Define Latent variable models.
ChatGPT
1. Define Clustering
Clustering is a type of unsupervised learning that involves grouping a set of objects in such a
way that objects in the same group (called a cluster) are more similar to each other than to
those in other groups. It is often used for data exploration and pattern discovery.
• Hard Clustering: Each data point belongs to exactly one cluster. Algorithms like k-
means clustering assign each data point to a single cluster.
• Soft Clustering: Each data point can belong to multiple clusters with varying degrees
of membership. Algorithms like Gaussian Mixture Models (GMM) assign
probabilities to each data point for each cluster.
• Agglomerative (Bottom-Up): Starts with each data point as a single cluster and
merges the closest pairs of clusters until only one or a specified number of clusters
remain.
• Divisive (Top-Down): Starts with all data points in one cluster and recursively splits
them into smaller clusters.
Spectral Clustering is a technique that uses the eigenvalues (spectrum) of a similarity matrix
(derived from the data) to perform dimensionality reduction before clustering in fewer
dimensions. It often works well for complex clustering problems where clusters are not
necessarily spherical.
Dimensionality Reduction is the process of reducing the number of random variables under
consideration by obtaining a set of principal variables. It simplifies the dataset while retaining
as much variance as possible, often used to reduce the complexity of models and visualize
high-dimensional data.
Topic Modelling is a type of statistical model used to discover abstract topics within a
collection of documents. It helps in identifying patterns and themes in large sets of
unstructured text data. Common methods include:
• Latent Dirichlet Allocation (LDA): Assumes that documents are mixtures of topics
and that topics are mixtures of words.
• Non-Negative Matrix Factorization (NMF): Decomposes the document-term
matrix into two lower-dimensional matrices, identifying topics.
10. Define Latent Variable Models
Latent Variable Models are models that include variables that are not directly observed but
are inferred from other observed variables. These latent variables help to explain complex
patterns in the observed data. Examples include:
4o
Unit 4 Short answer Questions: 1. Define hidden markov model. 2. What are Bayesian
networks? 3. Describe markov random fields. 4. What is text classification? 5. Write the
assumptions of naïve bayes classifier. 6. What is joint probability distribution? 7. Compare
markov model and HMM. 8. List different types of graphical models. 9. Write the limitations
of Markov model. 10. Define state, transition and emission probabilities in Hidden Markov
Model
ChatGPT
Hidden Markov Model (HMM) is a statistical model that represents systems where the
states are not directly visible (hidden) but can be observed through some output (emission). It
consists of:
Bayesian Networks (also known as Belief Networks or Bayes Nets) are graphical models
that represent the probabilistic relationships among a set of variables using a directed acyclic
graph (DAG). Each node represents a random variable, and the edges represent conditional
dependencies.
Markov Random Fields (MRFs) are undirected graphical models that represent the
dependencies between random variables. Each node represents a random variable, and the
edges represent the conditional dependencies between them. Unlike Bayesian networks,
MRFs do not have directed edges and can capture symmetric dependencies.
Joint Probability Distribution is a probability distribution that gives the probability of each
possible combination of values of two or more random variables. For two variables XXX and
YYY, the joint probability distribution P(X,Y)P(X, Y)P(X,Y) represents the probability of
X=xX = xX=x and Y=yY = yY=y occurring together.
• Markov Model:
o The states are directly observable.
o It describes a system where the next state depends only on the current state
(Markov property).
• Hidden Markov Model (HMM):
o The states are hidden and not directly observable.
o It includes observable outputs (emissions) that are probabilistically related to
the hidden states.
• Assumes Markov Property: Only the current state influences the next state, which
may not capture all dependencies in real-world scenarios.
• State Explosion: The number of states can grow exponentially with the number of
variables, leading to computational challenges.
• No Hidden States: Does not model systems where the true state is not directly
observable.
4o
Unit 5 Short answer Questions: 1. Define advanced learning. 2. What is Representation
learning? 3. Explain Ensemble Learning methods. 4. Compare boosting and bagging. 5.
Define deep learning. 6. What is reinforcement learning? 7. Define neural network. 8. Define
cost function. 9. Compare CNN and RNN. 10. Define active learning
ChatGPT
Ensemble Learning methods combine multiple machine learning models to improve overall
performance. The idea is to leverage the strengths of each model to produce a better
predictive performance. Common ensemble techniques include:
• Bagging: Aggregates the predictions of multiple models (e.g., decision trees) trained
on different subsets of the data.
• Boosting: Sequentially trains models, each correcting the errors of its predecessor.
• Stacking: Combines multiple models using a meta-model that learns how to best
combine their outputs.
Deep Learning is a subset of machine learning that involves neural networks with many
layers (deep neural networks). These networks can automatically learn hierarchical
representations of data, making them particularly effective for tasks such as image
recognition, natural language processing, and speech recognition.
Reinforcement Learning (RL) is a type of machine learning where an agent learns to make
decisions by performing actions in an environment to maximize cumulative reward. The
agent receives feedback in the form of rewards or penalties and adjusts its strategy to achieve
long-term goals.
Neural Network is a computational model inspired by the way biological neural networks in
the human brain process information. It consists of interconnected nodes (neurons) organized
in layers. Each connection has an associated weight, and neurons apply an activation function
to their inputs to produce an output. Neural networks are used for a variety of tasks, including
classification, regression, and pattern recognition.
Cost Function (or Loss Function) is a measure of how well a machine learning model's
predictions match the actual outcomes. It quantifies the error between predicted values and
true values. The goal of training a model is to minimize the cost function, thereby improving
the model's accuracy. Common cost functions include Mean Squared Error (MSE) for
regression and Cross-Entropy Loss for classification.
Active Learning is a type of machine learning where the algorithm selectively queries the
most informative data points for labeling. Instead of using a randomly selected subset of data,
active learning identifies and requests labels for the most uncertain or representative
examples. This approach is useful when labeling data is expensive or time-consuming, as it
can significantly reduce the amount of labeled data needed to train an effective model.