Unit-4 Ai
Unit-4 Ai
There are several ways to model uncertainty in AI. Bayesian approaches quantify
uncertainty by treating model parameters as probabilistic entities, offering confidence
intervals or probability distributions for predictions. Fuzzy logic addresses uncertainty
by allowing partial truth values between 0 and 1, making it useful for systems where
binary decisions (true/false) are inadequate. Probabilistic graphical models like Hidden
Markov Models or Bayesian Networks handle uncertainty by modelling relationships
between variables and their likelihoods.
Additionally, deep learning models handle uncertainty through techniques like dropout
as a regularization method, which can be interpreted to provide uncertainty estimates
in predictions. Uncertainty measures play a critical role in applications like autonomous
systems, healthcare, and decision-making processes, where incorrect or overconfident
predictions can have significant consequences.
Uncertainty in artificial intelligence (AI) refers to the lack of complete information or the
presence of variability in data and models. Understanding and modeling uncertainty is
crucial for making informed decisions and improving the robustness of AI systems.
There are several types of uncertainty in AI, including:
1. Aleatoric Uncertainty: This type of uncertainty arises from the inherent
randomness or variability in data. It is often referred to as “data uncertainty.” For
example, in a classification task, aleatoric uncertainty may arise from variations
in sensor measurements or noisy labels.
2. Epistemic Uncertainty: Epistemic uncertainty is related to the lack of
knowledge or information about a model. It represents uncertainty that can
potentially be reduced with more data or better modeling techniques. It is also
known as “model uncertainty” and arises from model limitations, such as
simplifications or assumptions.
3. Parameter Uncertainty: This type of uncertainty is specific to probabilistic
models, such as Bayesian neural networks. It reflects uncertainty about the
values of model parameters and is characterized by probability distributions over
those parameters.
4. Uncertainty in Decision-Making: Uncertainty in AI systems can affect the
decision-making process. For instance, in reinforcement learning, agents often
need to make decisions in environments with uncertain outcomes, leading to
decision-making uncertainty.
5. Uncertainty in Natural Language Understanding: In natural language
processing (NLP), understanding and generating human language can be
inherently uncertain due to language ambiguity, polysemy (multiple meanings),
and context-dependent interpretations.
6. Uncertainty in Probabilistic Inference: Bayesian methods and probabilistic
graphical models are commonly used in AI to model uncertainty. Uncertainty can
arise from the process of probabilistic inference itself, affecting the reliability of
model predictions.
7. Uncertainty in Reinforcement Learning: In reinforcement learning,
uncertainty may arise from the stochasticity of the environment or the
exploration-exploitation trade-off. Agents must make decisions under uncertainty
about the outcomes of their actions.
8. Uncertainty in Autonomous Systems: Autonomous systems, such as self-
driving cars or drones, must navigate uncertain and dynamic environments. This
uncertainty can pertain to the movement of other objects, sensor
measurements, and control actions.
9. Uncertainty in Safety-Critical Systems: In applications where safety is
paramount, such as healthcare or autonomous vehicles, managing uncertainty is
critical. Failure to account for uncertainty can lead to dangerous consequences.
10.Uncertainty in Transfer Learning: When transferring a pre-trained AI model to
a new domain or task, uncertainty can arise due to domain shift or differences in
data distributions. Understanding this uncertainty is vital for adapting the model
effectively.
11.Uncertainty in Human-AI Interaction: When AI systems interact with
humans, there can be uncertainty in understanding and responding to human
input, as well as uncertainty in predicting human behavior and preferences.
Addressing and quantifying these various types of uncertainty is an ongoing research
area in AI, and techniques such as probabilistic modeling, Bayesian inference, and
Monte Carlo methods are commonly used to manage and mitigate uncertainty in AI
systems.
Become a master of Data Science and AI by going through this PG Diploma in
Data Science and Artificial Intelligence!
Techniques for Addressing Uncertainty in AI
We’ve just discussed the different types of uncertainty in AI. Now, let’s switch gears
and learn techniques for addressing uncertainty in AI. It’s like going from understanding
the problem to finding solutions for it.
Probabilistic Logic Programming
Probabilistic logic programming (PLP) is a way to mix logic and probability to handle
uncertainty in computer programs. This is useful for computer programmers when they
are not completely sure about the facts and rules they are working with. PLP uses
probabilities to help them make decisions and learn from data. They can use different
techniques, like Bayesian logic programs or Markov logic networks, to put PLP into
action. PLP is handy in various areas of artificial intelligence, like making guesses when
we’re not sure, planning when there are risks involved, and creating models with
pictures and symbols.
Fuzzy Logic Programming
To deal with uncertainty in logic programming, there’s a method called fuzzy logic
programming (FLP). FLP combines regular logic with something called “fuzzy” logic.
This helps programmers express things that are a bit unclear or not black and white.
FLP also helps them make decisions and learn from this uncertain information. They can
use different ways to do FLP, like fuzzy Prolog, fuzzy answer set programming, and
fuzzy description logic. FLP is useful in various areas of artificial intelligence, like
understanding language, working with images, and making decisions when things are
not very clear.
Probability Theory
Introduction to Probabilistic Reasoning
Probabilistic reasoning provides a mathematical framework for representing and
manipulating uncertainty. Unlike deterministic systems, which operate under the
assumption of complete and exact information, probabilistic systems acknowledge that
the real world is fraught with uncertainties. By employing probabilities, AI systems can
make informed decisions even in the face of ambiguity.
Need for Probabilistic Reasoning in AI
Probabilistic reasoning with artificial intelligence is important to different tasks such as:
Machine learning helps algorithms learn from possibly incomplete or noisy
data.
Robotics: Provides robots the capability to act in and interact with dynamic and
uncertain environments.
Natural Language Processing: Gives computers an understanding of human
language in all its ambiguity and sensitivity to context.
Decision Making Systems: It empowers AI systems for well-informed decisions
and judgments by considering the likelihood of alternative outcomes.
Probabilistic reasoning can introduce uncertainty, allowing the AI system to sensibly
operate in the real world and make effective predictions.
Key Concepts in Probabilistic Reasoning
1. Bayesian Networks
Imagine a kind of spider web cluttered with factors—one might say, a type of
detective board associating suspects, motives, and evidence. This, in a nutshell,
is your basic intuition behind a Bayesian network: a graphical model showing the
relationships between variables and their conditional probabilities.
Advantages: Bayesian Networks are very effective to express cause and effect
and reasoning about missing information. They have found wide applications in
medical diagnosis where symptoms are considered variables which have
different grades of association with diseases considered other variables.
2. Markov Models
Consider a weather forecast. A Markov model predicts the future state of a
system from its current state and its past history. For instance, according to a
simple Markov model of weather, the probability that a sunny day will be
followed by another sunny day is greater than the probability that a sunny day
will be followed by a rainy day.
Advantages: Markov models are effective and easy to implement. They are
widely used, such as in speech recognition, and they can also be used for
prediction, depending on the choice of the previous words, as in the probability
of the next word.
3. Hidden Markov Models (HMMs)
Consider, for example, a weather-predicting scenario that includes states of
some kind and yet also includes invisible states, such as humidity. HMMs are a
generalization of Markov models in which states are hidden.
Advantages: HMMs are found to be very powerful in cases where hidden
variables are taken into account. Such tasks usually involve stock market
prediction, where the factors that govern prices are not fully transparent.
4. Probabilistic Graphical Models
Probabilistic Graphical Models give a broader framework encompassing both
Bayesian networks and HMMs. In general, PGMs are an approach for
representation and reasoning in a framework of uncertain information, given in
graphical structure.
Advantages: PGMs offer a powerful, flexible, and expressive language for doing
probabilistic reasoning, which is well suited for complex relationships that may
capture many different types of uncertainty.
These techniques are not mutually exclusive; rather, they can be combined and
extended to handle more and more specific problems in AI. For instance, the particular
technique that may be used will depend on the character of the uncertainty and the
type of result that may be sought. In turn, probabilistic reasoning can allow AI systems
to make not just predictions but quantifiable ones, thus leading to more robust and
reliable decision-making.
Techniques in Probabilistic Reasoning
1. Inference: The process of computing the probability distribution of certain
variables given known values of other variables. Exact inference methods include
variable elimination and the junction tree algorithm, while approximate inference
methods include Markov Chain Monte Carlo (MCMC) and belief propagation.
2. Learning: Involves updating the parameters and structure of probabilistic
models based on observed data. Techniques include maximum likelihood
estimation, Bayesian estimation, and expectation-maximization (EM).
3. Decision Making: Utilizing probabilistic models to make decisions that
maximize expected utility. Techniques involve computing expected rewards and
selecting actions accordingly, often implemented using frameworks like POMDPs.
How Probabilistic Reasoning Empowers AI Systems?
Suppose for a moment the maze in which you find yourself with nothing but an out-of-
focus map. The kind of traditional, rule-based reasoning would grind you to a halt,
unable to reason about the likelihood of a dead-end or an unclear way to go.
Probabilistic reasoning is like a powerful flashlight that can show the path ahead even
in circumstances of uncertainty.
This is the way in which probabilistic reasoning empowers AI systems:
Quantifying Uncertainty: Probabilistic reasoning does not shrink from
uncertainty. It turns to the tools of probability theory to represent uncertainty by
attaching degrees of likelihood. For example, instead of a simple “true” or “false”
to whether it will rain tomorrow, probabilistic reasoning might assign a 60%
chance that it will.
Reasoning with Evidence: AI systems cannot enjoy the luxury of making
decisions in isolation. They have to consider the available evidence and act
accordingly to help refine the probabilities. For example, the probability for a
rainy day can be refined to increase to 80% if dark clouds come in the afternoon.
Based on Past Experience: AI systems can learn from past experiences.
Probabilistic reasoning factors in the prior knowledge of the nature of decisions.
For example, an AI system that was trained in the past on historical weather data
in your location might, therefore, consider seasonal trends when calculating the
probability of rain.
Effective Decision-Making: Probabilistic reasoning will also enable AI systems
to make effective and well-informed decisions based on quantified uncertainty,
evidence, and prior knowledge. Returning to our maze analogy, the AI would be
able to actually weigh the probability of different paths, given the map at each
point in the maze and whatever it’s found its way through, making its reaching
the goal much more likely.
Probabilistic reasoning is not about achieving perfection in a world full of uncertainty
but about realizing the limits of perfect knowledge and working best with the
information available. This enables AI systems to perform soundly in the realistic world,
full of vagueness and where information is, in general, not complete.
Applications of Probabilistic Reasoning in AI
Probabilistic reasoning is widely applicable in a variety of domains:
1. Robotics: Probabilistic reasoning enables robots to navigate and interact with
uncertain environments. For instance, simultaneous localization and mapping
(SLAM) algorithms use probabilistic techniques to construct maps of unknown
environments while tracking the robot’s location.
2. Healthcare: In medical diagnosis, probabilistic models help in assessing the
likelihood of diseases given symptoms and test results. Bayesian networks, for
example, can model the relationships between various medical conditions and
diagnostic indicators.
3. Natural Language Processing (NLP): Probabilistic models, such as HMMs and
Conditional Random Fields (CRFs), are used for tasks like part-of-speech tagging,
named entity recognition, and machine translation.
4. Finance: Probabilistic reasoning is used to model market behavior, assess risks,
and make investment decisions. Techniques like Bayesian inference and Monte
Carlo simulations are commonly employed in financial modeling.
Advantages of Probabilistic Reasoning
Flexibility: Probabilistic models can handle a wide range of uncertainties and
are adaptable to various domains.
Robustness: These models are robust to noise and incomplete data, making
them reliable in real-world applications.
Interpretable: Probabilistic models provide a clear framework for understanding
and quantifying uncertainty, which can aid in transparency and explainability.
Conclusion
Probabilistic reasoning is one of the most important methods to empower AI
applications and is widely used, dealing with the uncertainty of the problem to make
logical decisions. With the built-in probabilities, AI systems can navigate through
complexities in the real world, ultimately improving both reliability and performance.
Fullscreen
Real world applications are probabilistic in nature, and to represent the relationship
between multiple events, we need a Bayesian network. It can also be used in various
tasks including prediction, anomaly detection, diagnostics, automated insight,
reasoning, time series prediction, and decision making under uncertainty.
Bayesian Network can be used for building models from data and experts opinions, and
it consists of two parts:
o Directed Acyclic Graph
o Table of conditional probabilities.
The generalized form of Bayesian network that represents and solve decision problems
under uncertain knowledge is known as an Influence diagram.
A Bayesian network graph is made up of nodes and Arcs (directed links),
where:
o Each node corresponds to the random variables, and a variable can
be continuous or discrete.
o Arc or directed arrows represent the causal relationship or conditional
probabilities between random variables. These directed links or arrows connect
the pair of nodes in the graph.
These links represent that one node directly influence the other node, and if
there is no directed link that means that nodes are independent with each other
o In the above diagram, A, B, C, and D are random variables
represented by the nodes of the network graph.
o If we are considering node B, which is connected with node A by a
directed arrow, then node A is called the parent of Node B.
o Node C is independent of node A.
Note: The Bayesian network graph does not contain any cyclic graph. Hence, it is
known as a directed acyclic graph or DAG.
The Bayesian network has mainly two components:
o Causal Component
o Actual numbers
Each node in the Bayesian network has condition probability distribution P(Xi |
Parent(Xi) ), which determines the effect of the parent on that node.
Bayesian network is based on Joint probability distribution and conditional probability.
So let's first understand the joint probability distribution:
Joint probability distribution:
If we have variables x1, x2, x3,....., xn, then the probabilities of a different combination
of x1, x2, x3.. xn, are known as Joint probability distribution.
P[x1, x2, x3,....., xn], it can be written as the following way in terms of the joint
probability distribution.
= P[x1| x2, x3,....., xn]P[x2, x3,....., xn]
= P[x1| x2, x3,....., xn]P[x2|x3,....., xn]....P[xn-1|xn]P[xn].
In general for each variable Xi, we can write the equation as:
P(Xi|Xi-1,........., X1) = P(Xi |Parents(Xi ))
Explanation of Bayesian network:
Let's understand the Bayesian network through an example by creating a directed
acyclic graph:
Example: Harry installed a new burglar alarm at his home to detect burglary. The
alarm reliably responds at detecting a burglary but also responds for minor
earthquakes. Harry has two neighbors David and Sophia, who have taken a
responsibility to inform Harry at work when they hear the alarm. David always calls
Harry when he hears the alarm, but sometimes he got confused with the phone ringing
and calls at that time too. On the other hand, Sophia likes to listen to high music, so
sometimes she misses to hear the alarm. Here we would like to compute the probability
of Burglary Alarm.
Problem:
Calculate the probability that alarm has sounded, but there is neither a
burglary, nor an earthquake occurred, and David and Sophia both called the
Harry.
Solution:
o The Bayesian network for the above problem is given below. The network
structure is showing that burglary and earthquake is the parent node of the
alarm and directly affecting the probability of alarm's going off, but David and
Sophia's calls depend on alarm probability.
o The network is representing that our assumptions do not directly perceive the
burglary and also do not notice the minor earthquake, and they also not confer
before calling.
o The conditional distributions for each node are given as conditional probabilities
table or CPT.
o Each row in the CPT must be sum to 1 because all the entries in the table
represent an exhaustive set of cases for the variable.
o In CPT, a boolean variable with k boolean parents contains 2 K probabilities.
Hence, if there are two parents, then CPT will contain 4 probability values
List of all events occurring in this network:
o Burglary (B)
o Earthquake(E)
o Alarm(A)
o David Calls(D)
o Sophia calls(S)
We can write the events of problem statement in the form of probability: P[D, S, A, B,
E], can rewrite the above probability statement using joint probability distribution:
P[D, S, A, B, E]= P[D | S, A, B, E]. P[S, A, B, E]
=P[D | S, A, B, E]. P[S | A, B, E]. P[A, B, E]
= P [D| A]. P [ S| A, B, E]. P[ A, B, E]
= P[D | A]. P[ S | A]. P[A| B, E]. P[B, E]
= P[D | A ]. P[S | A]. P[A| B, E]. P[B |E]. P[E]
Let's take the observed probability for the Burglary and earthquake component:
P(B= True) = 0.002, which is the probability of burglary.
P(B= False)= 0.998, which is the probability of no burglary.
P(E= True)= 0.001, which is the probability of a minor earthquake
P(E= False)= 0.999, Which is the probability that an earthquake not occurred.
We can provide the conditional probabilities as per the below tables:
Conditional probability table for Alarm A:
The Conditional probability of Alarm A depends on Burglar and earthquake:
From the formula of joint distribution, we can write the problem statement in the form
of probability distribution:
P(S, D, A, ¬B, ¬E) = P (S|A) *P (D|A)*P (A|¬B ^ ¬E) *P (¬B) *P (¬E).
= 0.75* 0.91* 0.001* 0.998*0.999
= 0.00068045.
Hence, a Bayesian network can answer any query about the domain by using
Joint distribution.
The semantics of Bayesian Network:
There are two ways to understand the semantics of the Bayesian network, which is
given below:
1. To understand the network as the representation of the Joint probability
distribution.
It is helpful to understand how to construct the network.
2. To understand the network as an encoding of a collection of conditional
independence statements.
It is helpful in designing inference procedure.
MACHINE LEARNING
Key Points:
Supervised learning involves training a machine from labeled data.
Labeled data consists of examples with the correct answer or classification.
The machine learns the relationship between inputs (fruit images) and outputs
(fruit labels).
The trained machine can then make predictions on new, unlabeled data.
Example:
Let’s say you have a fruit basket that you want to identify. The machine would first
analyze the image to extract features such as its shape, color, and texture. Then, it
would compare these features to the features of the fruits it has already learned about.
If the new image’s features are most similar to those of an apple, the machine would
predict that the fruit is an apple.
For instance, suppose you are given a basket filled with different kinds of fruits. Now
the first step is to train the machine with all the different fruits one by one like this:
If the shape of the object is rounded and has a depression at the top, is red in
color, then it will be labeled as –Apple.
If the shape of the object is a long curving cylinder having Green-Yellow color,
then it will be labeled as –Banana.
Now suppose after training the data, you have given a new separate fruit, say Banana
from the basket, and asked to identify it.
Since the machine has already learned the things from previous data and this time has
to use it wisely. It will first classify the fruit with its shape and color and would confirm
the fruit name as BANANA and put it in the Banana category. Thus the machine learns
the things from training data(basket containing fruits) and then applies the knowledge
to test data(new fruit).
Types of Supervised Learning
Supervised learning is classified into two categories of algorithms:
Regression: A regression problem is when the output variable is a real value,
such as “dollars” or “weight”.
Classification: A classification problem is when the output variable is a
category, such as “Red” or “blue” , “disease” or “no disease”.
Supervised learning deals with or learns with “labeled” data. This implies that some
data is already tagged with the correct answer.
1- Regression
Regression is a type of supervised learning that is used to predict continuous values,
such as house prices, stock prices, or customer churn. Regression algorithms learn a
function that maps from the input features to the output value.
Some common regression algorithms include:
Linear Regression
Polynomial Regression
Support Vector Machine Regression
Decision Tree Regression
Random Forest Regression
2- Classification
Classification is a type of supervised learning that is used to predict categorical values,
such as whether a customer will churn or not, whether an email is spam or not, or
whether a medical image shows a tumor or not. Classification algorithms learn a
function that maps from the input features to a probability distribution over the output
classes.
Some common classification algorithms include:
Logistic Regression
Support Vector Machines
Decision Trees
Random Forests
Naive Baye
Evaluating Supervised Learning Models
Evaluating supervised learning models is an important step in ensuring that the model
is accurate and generalizable. There are a number of different metrics that can be used
to evaluate supervised learning models, but some of the most common ones include:
For Regression
Mean Squared Error (MSE): MSE measures the average squared difference
between the predicted values and the actual values. Lower MSE values indicate
better model performance.
Root Mean Squared Error (RMSE): RMSE is the square root of
MSE, representing the standard deviation of the prediction errors. Similar to
MSE, lower RMSE values indicate better model performance.
Mean Absolute Error (MAE): MAE measures the average absolute difference
between the predicted values and the actual values. It is less sensitive to outliers
compared to MSE or RMSE.
R-squared (Coefficient of Determination): R-squared measures the
proportion of the variance in the target variable that is explained by the
model. Higher R-squared values indicate better model fit.
For Classification
Accuracy: Accuracy is the percentage of predictions that the model makes
correctly. It is calculated by dividing the number of correct predictions by the
total number of predictions.
Precision: Precision is the percentage of positive predictions that the model
makes that are actually correct. It is calculated by dividing the number of true
positives by the total number of positive predictions.
Recall: Recall is the percentage of all positive examples that the model correctly
identifies. It is calculated by dividing the number of true positives by the total
number of positive examples.
F1 score: The F1 score is a weighted average of precision and recall. It is
calculated by taking the harmonic mean of precision and recall.
Confusion matrix: A confusion matrix is a table that shows the number of
predictions for each class, along with the actual class labels. It can be used to
visualize the performance of the model and identify areas where the model is
struggling.
Applications of Supervised learning
Supervised learning can be used to solve a wide variety of problems, including:
Spam filtering: Supervised learning algorithms can be trained to identify and
classify spam emails based on their content, helping users avoid unwanted
messages.
Image classification: Supervised learning can automatically classify images
into different categories, such as animals, objects, or scenes, facilitating tasks
like image search, content moderation, and image-based product
recommendations.
Medical diagnosis: Supervised learning can assist in medical diagnosis by
analyzing patient data, such as medical images, test results, and patient history,
to identify patterns that suggest specific diseases or conditions.
Fraud detection: Supervised learning models can analyze financial transactions
and identify patterns that indicate fraudulent activity, helping financial
institutions prevent fraud and protect their customers.
Natural language processing (NLP): Supervised learning plays a crucial role
in NLP tasks, including sentiment analysis, machine translation, and text
summarization, enabling machines to understand and process human language
effectively.
Advantages of Supervised learning
Supervised learning allows collecting data and produces data output from
previous experiences.
Helps to optimize performance criteria with the help of experience.
Supervised machine learning helps to solve various types of real-world
computation problems.
It performs classification and regression tasks.
It allows estimating or mapping the result to a new sample.
We have complete control over choosing the number of classes we want in the
training data.
Disadvantages of Supervised learning
Classifying big data can be challenging.
Training for supervised learning needs a lot of computation time. So, it requires a
lot of time.
Supervised learning cannot handle all complex tasks in Machine Learning.
Computation time is vast for supervised learning.
It requires a labelled data set.
It requires a training process.
Unsupervised Learning.
Unsupervised learning is a type of machine learning that learns from unlabeled data.
This means that the data does not have any pre-existing labels or categories. The goal
of unsupervised learning is to discover patterns and relationships in the data without
any explicit guidance.
Unsupervised learning is the training of a machine using information that is neither
classified nor labeled and allowing the algorithm to act on that information without
guidance. Here the task of the machine is to group unsorted information according to
similarities, patterns, and differences without any prior training of data.
Unlike supervised learning, no teacher is provided that means no training will be given
to the machine. Therefore the machine is restricted to find the hidden structure in
unlabeled data by itself.
You can use unsupervised learning to examine the animal data that has been gathered
and distinguish between several groups according to the traits and actions of the
animals. These groupings might correspond to various animal species, providing you to
categorize the creatures without depending on labels that already exist.
Key Points
Unsupervised learning allows the model to discover patterns and relationships in
unlabeled data.
Clustering algorithms group similar data points together based on their inherent
characteristics.
Feature extraction captures essential information from the data, enabling the
model to make meaningful distinctions.
Label association assigns categories to the clusters based on the extracted
patterns and characteristics.
Example
Imagine you have a machine learning model trained on a large dataset of unlabeled
images, containing both dogs and cats. The model has never seen an image of a dog or
cat before, and it has no pre-existing labels or categories for these animals. Your task is
to use unsupervised learning to identify the dogs and cats in a new, unseen image.
For instance, suppose it is given an image having both dogs and cats which it has
never seen.
Thus the machine has no idea about the features of dogs and cats so we can’t
categorize it as ‘dogs and cats ‘. But it can categorize them according to their
similarities, patterns, and differences, i.e., we can easily categorize the above picture
into two parts. The first may contain all pics having dogs in them and the second part
may contain all pics having cats in them. Here you didn’t learn anything before, which
means no training data or examples.
It allows the model to work on its own to discover patterns and information that was
previously undetected. It mainly deals with unlabelled data.
Types of Unsupervised Learning
Unsupervised learning is classified into two categories of algorithms:
Clustering: A clustering problem is where you want to discover the inherent
groupings in the data, such as grouping customers by purchasing behavior.
Association: An association rule learning problem is where you want to discover
rules that describe large portions of your data, such as people that buy X also
tend to buy Y.
Clustering
Clustering is a type of unsupervised learning that is used to group similar data points
together. Clustering algorithms work by iteratively moving data points closer to their
cluster centers and further away from data points in other clusters.
1. Exclusive (partitioning)
2. Agglomerative
3. Overlapping
4. Probabilistic
Clustering Types:-
1. Hierarchical clustering
2. K-means clustering
3. Principal Component Analysis
4. Singular Value Decomposition
5. Independent Component Analysis
6. Gaussian Mixture Models (GMMs)
7. Density-Based Spatial Clustering of Applications with Noise (DBSCAN)
Association rule learning
Association rule learning is a type of unsupervised learning that is used to identify
patterns in a data. Association rule learning algorithms work by finding relationships
between different items in a dataset.
Some common association rule learning algorithms include:
Apriori Algorithm
Eclat Algorithm
FP-Growth Algorithm
Evaluating Non-Supervised Learning Models
Evaluating non-supervised learning models is an important step in ensuring that the
model is effective and useful. However, it can be more challenging than evaluating
supervised learning models, as there is no ground truth data to compare the model’s
predictions to.
There are a number of different metrics that can be used to evaluate non-supervised
learning models, but some of the most common ones include:
Silhouette score: The silhouette score measures how well each data point is
clustered with its own cluster members and separated from other clusters. It
ranges from -1 to 1, with higher scores indicating better clustering.
Calinski-Harabasz score: The Calinski-Harabasz score measures the ratio
between the variance between clusters and the variance within clusters. It
ranges from 0 to infinity, with higher scores indicating better clustering.
Adjusted Rand index: The adjusted Rand index measures the similarity
between two clusterings. It ranges from -1 to 1, with higher scores indicating
more similar clusterings.
Davies-Bouldin index: The Davies-Bouldin index measures the average
similarity between clusters. It ranges from 0 to infinity, with lower scores
indicating better clustering.
F1 score: The F1 score is a weighted average of precision and recall, which are
two metrics that are commonly used in supervised learning to evaluate
classification models. However, the F1 score can also be used to evaluate non-
supervised learning models, such as clustering models.
Application of Unsupervised learning
Non-supervised learning can be used to solve a wide variety of problems, including:
Anomaly detection: Unsupervised learning can identify unusual patterns or
deviations from normal behavior in data, enabling the detection of fraud,
intrusion, or system failures.
Scientific discovery: Unsupervised learning can uncover hidden relationships and
patterns in scientific data, leading to new hypotheses and insights in various
scientific fields.
Recommendation systems: Unsupervised learning can identify patterns and
similarities in user behavior and preferences to recommend products, movies, or
music that align with their interests.
Customer segmentation: Unsupervised learning can identify groups of customers
with similar characteristics, allowing businesses to target marketing campaigns
and improve customer service more effectively.
Image analysis: Unsupervised learning can group images based on their content,
facilitating tasks such as image classification, object detection, and image
retrieval.
Advantages of Unsupervised learning
It does not require training data to be labeled.
Dimensionality reduction can be easily accomplished using unsupervised
learning.
Capable of finding previously unknown patterns in data.
Unsupervised learning can help you gain insights from unlabeled data that you
might not have been able to get otherwise.
Unsupervised learning is good at finding patterns and relationships in data
without being told what to look for. This can help you learn new things about
your data.
Disadvantages of Unsupervised learning
Difficult to measure accuracy or effectiveness due to lack of predefined answers
during training.
The results often have lesser accuracy.
The user needs to spend time interpreting and label the classes which follow that
classification.
Unsupervised learning can be sensitive to data quality, including missing values,
outliers, and noisy data.
Without labelled data, it can be difficult to evaluate the performance of
unsupervised learning models, making it challenging to assess their
effectiveness.
Reinforcement Learning
Reinforcement Learning (RL) is a branch of machine learning focused on making
decisions to maximize cumulative rewards in a given situation. Unlike supervised
learning, which relies on a training dataset with predefined answers, RL involves
learning through experience. In RL, an agent learns to achieve a goal in an uncertain,
potentially complex environment by performing actions and receiving feedback through
rewards or penalties.
Key Concepts of Reinforcement Learning
Agent: The learner or decision-maker.
Environment: Everything the agent interacts with.
State: A specific situation in which the agent finds itself.
Action: All possible moves the agent can make.
Reward: Feedback from the environment based on the action taken.
How Reinforcement Learning Works
RL operates on the principle of learning optimal behavior through trial and error. The
agent takes actions within the environment, receives rewards or penalties, and adjusts
its behavior to maximize the cumulative reward. This learning process is characterized
by the following elements:
Policy: A strategy used by the agent to determine the next action based on the
current state.
Reward Function: A function that provides a scalar feedback signal based on
the state and action.
Value Function: A function that estimates the expected cumulative reward from
a given state.
Model of the Environment: A representation of the environment that helps in
planning by predicting future states and rewards.
Example: Navigating a Maze
The problem is as follows: We have an agent and a reward, with many hurdles in
between. The agent is supposed to find the best possible path to reach the reward. The
following problem explains the problem more easily.
The above image shows the robot, diamond, and fire. The goal of the robot is to get the
reward that is the diamond and avoid the hurdles that are fired. The robot learns by
trying all the possible paths and then choosing the path which gives him the reward
with the least hurdles. Each right step will give the robot a reward and each wrong step
will subtract the reward of the robot. The total reward will be calculated when it reaches
the final reward that is the diamond.
Example: Object
Example: Chess game,text summarization
recognition,spam detetction
Types of Reinforcement:
1. Positive: Positive Reinforcement is defined as when an event, occurs due to a
particular behavior, increases the strength and the frequency of the behavior. In
other words, it has a positive effect on behavior.
Advantages of reinforcement learning are:
Maximizes Performance
Sustain Change for a long period of time
Too much Reinforcement can lead to an overload of states which can
diminish the results
2. Negative: Negative Reinforcement is defined as strengthening of behavior
because a negative condition is stopped or avoided.
3. Advantages of reinforcement learning:
Increases Behavior
Provide defiance to a minimum standard of performance
It Only provides enough to meet up the minimum behavior
Elements of Reinforcement Learning
i) Policy: Defines the agent’s behavior at a given time.
ii) Reward Function: Defines the goal of the RL problem by providing feedback.
iii) Value Function: Estimates long-term rewards from a state.
iv) Model of the Environment: Helps in predicting future states and rewards for
planning.
a soft margin to the data set, the SVM tries to minimize (1/margin+∧(∑penalty)). Hinge
margin. So the margins in these types of cases are called soft margins. When there is
loss is a commonly used penalty. If no violations no hinge loss.If violations hinge loss
proportional to the distance of violation.
Till now, we were talking about linearly separable data(the group of blue balls and red
balls are separable by a straight line/linear line). What to do if data are not linearly
separable?
Original 1D dataset for classification
Say, our data is shown in the figure above. SVM solves this by creating a new variable
using a kernel. We call a point xi on the line and we create a new variable yi as a
function of distance from origin o.so if we plot this we get something like as shown
below
The vector W represents the normal vector to the hyperplane. i.e the direction
perpendicular to the hyperplane. The parameter b in the equation represents the offset
or distance of the hyperplane from the origin along the normal vector w.
The distance between a data point x_i and the decision boundary can be calculated as:
Optimization:
For Hard margin linear SVM classifier:
Types of Support Vector Machine
Based on the nature of the decision boundary, Support Vector Machines (SVM) can be
divided into two main parts:
Linear SVM: Linear SVMs use a linear decision boundary to separate the data
points of different classes. When the data can be precisely linearly separated,
linear SVMs are very suitable. This means that a single straight line (in 2D) or a
hyperplane (in higher dimensions) can entirely divide the data points into their
respective classes. A hyperplane that maximizes the margin between the classes
is the decision boundary.
Non-Linear SVM: Non-Linear SVM can be used to classify data when it cannot
be separated into two classes by a straight line (in the case of 2D). By using
kernel functions, nonlinear SVMs can handle nonlinearly separable data. The
original input data is transformed by these kernel functions into a higher-
dimensional feature space, where the data points can be linearly separated. A
linear SVM is used to locate a nonlinear decision boundary in this modified
space.
Popular kernel functions in SVM