Unit 3 Unsupervised Learning & Neural Network
Unit 3 Unsupervised Learning & Neural Network
In the previous topic, we learned supervised machine learning in which models are trained using
labeled data under the supervision of training data. But there may be many cases in which we do
not have labeled data and need to find the hidden patterns from the given dataset. So, to solve
such types of cases in machine learning, we need unsupervised learning techniques.
As the name suggests, unsupervised learning is a machine learning technique in which models
are not supervised using training dataset. Instead, models itself find the hidden patterns and
insights from the given data. It can be compared to learning which takes place in the human brain
while learning new things. It can be defined as:
Unsupervised learning is a type of machine learning in which models are trained using
unlabeled dataset and are allowed to act on that data without any supervision.
Example: Suppose the unsupervised learning algorithm is given an input dataset containing
images of different types of cats and dogs. The algorithm is never trained upon the given dataset,
which means it does not have any idea about the features of the dataset. The task of the
unsupervised learning algorithm is to identify the image features on their own. Unsupervised
learning algorithm will perform this task by clustering the image dataset into the groups
according to similarities between images.
o Unsupervised learning is helpful for finding useful insights from the data.
o Unsupervised learning is much similar as a human learns to think by their own
experiences, which makes it closer to the real AI.
o Unsupervised learning works on unlabeled and uncategorized data which make
unsupervised learning more important.
o In real-world, we do not always have input data with the corresponding output so to solve
such cases, we need unsupervised learning.
Here, we have taken an unlabeled input data, which means it is not categorized and
corresponding outputs are also not given. Now, this unlabeled input data is fed to the machine
learning model in order to train it. Firstly, it will interpret the raw data to find the hidden patterns
from the data and then will apply suitable algorithms such as k-means clustering, Decision tree,
etc.
Once it applies the suitable algorithm, the algorithm divides the data objects into groups
according to the similarities and difference between the objects.
The unsupervised learning algorithm can be further categorized into two types of problems:
o Clustering: Clustering is a method of grouping the objects into clusters such that objects
with most similarities remains into a group and has less or no similarities with the objects
of another group. Cluster analysis finds the commonalities between the data objects and
categorizes them as per the presence and absence of those commonalities.
o Association: An association rule is an unsupervised learning method which is used for
finding the relationships between variables in the large database. It determines the set of
items that occurs together in the dataset. Association rule makes marketing strategy more
effective. Such as people who buy X item (suppose a bread) are also tend to purchase Y
(Butter/Jam) item. A typical example of Association rule is Market Basket Analysis.
o K-means clustering
o KNN (k-nearest neighbors)
o Hierarchal clustering
o Anomaly detection
o Neural Networks
o Principle Component Analysis
o Independent Component Analysis
o Apriori algorithm
o Singular value decomposition
K-Means Clustering-
Step-01:
Step-02:
Step-03:
Calculate the distance between each data point and each cluster center.
The distance may be calculated either by using given distance function or by using euclidean
distance formula.
Step-04:
Step-05:
Step-06:
Keep repeating the procedure from Step-03 to Step-05 until any of the following stopping criteria
is met-
Center of newly formed clusters do not change
Data points remain present in the same cluster
Maximum number of iterations are reached
Advantages-
Point-01:
Point-02:
Disadvantages-
Problem-01:
Cluster the following eight points (with (x, y) representing locations) into three clusters:
A1(2, 10), A2(2, 5), A3(8, 4), A4(5, 8), A5(7, 5), A6(6, 4), A7(1, 2), A8(4, 9)
Initial cluster centers are: A1(2, 10), A4(5, 8) and A7(1, 2).
The distance function between two points a = (x1, y1) and b = (x2, y2) is defined as-
Ρ(a, b) = |x2 – x1| + |y2 – y1|
Use K-Means Algorithm to find the three cluster centers after the second iteration.
Solution-
We calculate the distance of each point from each of the center of the three clusters.
The distance is calculated by using the given distance function.
The following illustration shows the calculation of distance between point A1(2, 10) and each of
the center of the three clusters-
Ρ(A1, C1)
= |x2 – x1| + |y2 – y1|
= |2 – 2| + |10 – 10|
=0
Ρ(A1, C2)
= |x2 – x1| + |y2 – y1|
= |5 – 2| + |8 – 10|
=3+2
=5
Ρ(A1, C3)
= |x2 – x1| + |y2 – y1|
= |1 – 2| + |2 – 10|
=1+8
=9
In the similar manner, we calculate the distance of other points from each of the center of the
three clusters.
Next,
We draw a table showing all the results.
Using the table, we decide which point belongs to which cluster.
The given point belongs to that cluster whose center is nearest to it.
A2(2, 5) 5 6 4 C3
A3(8, 4) 12 7 9 C2
A4(5, 8) 5 0 10 C2
A5(7, 5) 10 5 9 C2
A6(6, 4) 10 5 7 C2
A7(1, 2) 9 10 0 C3
A8(4, 9) 3 2 10 C2
Cluster-01:
Cluster-02:
Cluster-03:
Now,
We re-compute the new cluster clusters.
The new cluster center is computed by taking mean of all the points contained in that cluster.
For Cluster-01:
We have only one point A1(2, 10) in Cluster-01.
So, cluster center remains the same.
For Cluster-02:
Center of Cluster-02
= ((8 + 5 + 7 + 6 + 4)/5, (4 + 8 + 5 + 4 + 9)/5)
= (6, 6)
For Cluster-03:
Center of Cluster-03
= ((2 + 1)/2, (5 + 2)/2)
= (1.5, 3.5)
Iteration-02:
We calculate the distance of each point from each of the center of the three clusters.
The distance is calculated by using the given distance function.
The following illustration shows the calculation of distance between point A1(2, 10) and each of
the center of the three clusters-
Ρ(A1, C1)
= |x2 – x1| + |y2 – y1|
= |2 – 2| + |10 – 10|
=0
Ρ(A1, C2)
= |x2 – x1| + |y2 – y1|
= |6 – 2| + |6 – 10|
=4+4
=8
Ρ(A1, C3)
= |x2 – x1| + |y2 – y1|
= |1.5 – 2| + |3.5 – 10|
= 0.5 + 6.5
=7
In the similar manner, we calculate the distance of other points from each of the center of the
three clusters.
Next,
We draw a table showing all the results.
Using the table, we decide which point belongs to which cluster.
The given point belongs to that cluster whose center is nearest to it.
A1(2, 10) 0 8 7 C1
A2(2, 5) 5 5 2 C3
A3(8, 4) 12 4 7 C2
A4(5, 8) 5 3 8 C2
A5(7, 5) 10 2 7 C2
A6(6, 4) 10 2 5 C2
A7(1, 2) 9 9 2 C3
A8(4, 9) 3 5 8 C1
Cluster-01:
Cluster-02:
Cluster-03:
Now,
We re-compute the new cluster clusters.
The new cluster center is computed by taking mean of all the points contained in that cluster.
For Cluster-01:
Center of Cluster-01
= ((2 + 4)/2, (10 + 9)/2)
= (3, 9.5)
For Cluster-02:
Center of Cluster-02
= ((8 + 5 + 7 + 6)/4, (4 + 8 + 5 + 4)/4)
= (6.5, 5.25)
For Cluster-03:
Center of Cluster-03
= ((2 + 1)/2, (5 + 2)/2)
= (1.5, 3.5)
Problem-02:
We calculate the distance of each point from each of the center of the two clusters.
The distance is calculated by using the euclidean distance formula.
The following illustration shows the calculation of distance between point A(2, 2) and each of
the center of the two clusters-
Ρ(A, C1)
= sqrt [ (x2 – x1)2 + (y2 – y1)2 ]
= sqrt [ (2 – 2)2 + (2 – 2)2 ]
= sqrt [ 0 + 0 ]
=0
Ρ(A, C2)
= sqrt [ (x2 – x1)2 + (y2 – y1)2 ]
= sqrt [ (1 – 2)2 + (1 – 2)2 ]
= sqrt [ 1 + 1 ]
= sqrt [ 2 ]
= 1.41
In the similar manner, we calculate the distance of other points from each of the center of the two
clusters.
Next,
We draw a table showing all the results.
Using the table, we decide which point belongs to which cluster.
The given point belongs to that cluster whose center is nearest to it.
A(2, 2) 0 1.41 C1
B(3, 2) 1 2.24 C1
C(1, 1) 1.41 0 C2
D(3, 1) 1.41 2 C1
E(1.5, 0.5) 1.58 0.71 C2
Cluster-01:
Cluster-02:
Now,
We re-compute the new cluster clusters.
The new cluster center is computed by taking mean of all the points contained in that cluster.
For Cluster-01:
Center of Cluster-01
= ((2 + 3 + 3)/3, (2 + 2 + 1)/3)
= (2.67, 1.67)
For Cluster-02:
Center of Cluster-02
= ((1 + 1.5)/2, (1 + 0.5)/2)
= (1.25, 0.75)
Neural networks are composed of a collection of nodes. The nodes are spread out across at least
three layers. The three layers are:
• An input layer
• A "hidden" layer
• An output layer
These three layers are the minimum. Neural networks can have more than one hidden layer, in
addition to the input layer and output layer.
No matter which layer it is part of, each node performs some sort of processing task or function
on whatever input it receives from the previous node (or from the input layer). Essentially, each
node contains a mathematical formula, with each variable within the formula weighted
differently. If the output of applying that mathematical formula to the input exceeds a certain
threshold, the node passes data to the next layer in the neural network. If the output is below the
threshold, no data is passed to the next layer.
Imagine that the Acme Corporation has an accounting department with a strict hierarchy. Acme
accounting department employees at the manager level approve expenses below $1,000,
directors approve expenses below $10,000, and the CFO approves any expenses that exceed
$10,000. When employees from other departments of Acme Corp. submit their expenses, they
first go to the accounting managers. Any expense over $1,000 gets passed to a director, while
expenses below $1,000 stay at the managerial level — and so on.
The accounting department of the Acme Corp. functions somewhat like a neural network. When
employees submit their expense reports, this is like a neural network's input layer. Each manager
and director is like a node within the neural network.
And, just as one accounting manager may ask another manager for assistance in interpreting an
expense report before passing it along to an accounting director, neural networks can be
architected in a variety of ways. Nodes can communicate in multiple directions.
There is no limit on how many nodes and layers a neural network can have, and these nodes can
interact in almost any way. Because of this, the list of types of neural networks is ever-
expanding. But, they can roughly be sorted into these categories:
• Shallow neural networks usually have only one hidden layer
• Deep neural networks have multiple hidden layers
Shallow neural networks are fast and require less processing power than deep neural networks,
but they cannot perform as many complex tasks as deep neural networks.
Below is an incomplete list of the types of neural networks that may be used today:
Perceptron neural networks are simple, shallow networks with an input layer and an output
layer.
Multilayer perceptron neural networks add complexity to perceptron networks, and include a
hidden layer.
Feed-forward neural networks only allow their nodes to pass information to a forward node.
Recurrent neural networks can go backwards, allowing the output from some nodes to impact
the input of preceding nodes.
Modular neural networks combine two or more neural networks in order to arrive at the output.
Radial basis function neural network nodes use a specific kind of mathematical function called
a radial basis function.
Liquid state machine neural networks feature nodes that are randomly connected to each other.
Residual neural networks allow data to skip ahead via a process called identity mapping,
combining the output from early layers with the output of later layers.
Generalization
Generalization in machine learning refers to the ability of a trained model to accurately make
predictions on new, unseen data. The purpose of generalization is to equip the model to
understand the patterns and relationships within its training data and apply them to previously
unseen examples from within the same distribution as the training set. Generalization is
foundational to the practical usefulness of machine learning and deep learning algorithms
because it allows them to produce models that can make reliable predictions in real-world
scenarios.
Generalization is important because the true test of a model's effectiveness is not how well it
performs on the training data, but rather how well it generalizes to new and unseen data. If a
model fails to generalize, it may exhibit high accuracy on the training set but will likely perform
poorly on real-world examples. This limitation renders the model impractical and unreliable in
practical applications.
A spam email classifier is a great example of generalization in machine learning. Suppose you
have a training dataset containing emails labeled as either spam or not spam and your goal is to
build a model that can accurately classify incoming emails as spam or legitimate based on their
content.
During the training phase, the machine learning algorithm learns from the set of labeled emails,
extracting relevant features and patterns to make predictions. The model optimizes its parameters
to minimize the training error and achieve high accuracy on the training data.
Now, the true test of the model's effectiveness lies in its ability to generalize to new, unseen
emails. When new emails arrive, the model needs to accurately classify them as spam or
legitimate without prior exposure to their content. This is where generalization comes in.
In this case, generalization enables the model to identify the underlying patterns and
characteristics that distinguish spam from legitimate emails. It allows the model to generalize its
learned knowledge beyond the specific examples in the training set and apply it to unseen data.
Without generalization, the model may become too specific to the training set, memorizing
specific words or phrases that were common in the training data and failing to understand new
examples. As a result, the model could incorrectly classify legitimate emails as spam or fail to
detect new spam patterns.
Have you ever noticed that your model false predictions over your testing data? Even though you
have trained your model with enough data still you get false negatives or false positives for your
test data. Why is that?
Either your model is underfitting or overfitting to your training data. Generalization is a measure
of how your model performs on predicting unseen data. So, it is important to come up with the
best-generalized model to give better performance against future data. Let us first understand
what is underfitting and overfitting, and then see what are the best practices to train a generalized
model.
A: Underfitting, B: Generalized, C: Overfitting
What is Underfitting?
Underfitting is a state where the model cannot model itself on the training data. And also not able
to generalize new data. You can notice it with the help of loss function during your training. A
simple rule of thumb is if both training loss and cross-validation loss are high, then your model
is underfitting.
Lack of data, not enough features, lack of variance in training data or high regularization rate can
cause underfitting. A simple solution is to add more shuffled data to your training. Depending on
what causes underfitting to your model, you can try introducing more meaningful features, feature
crossing and introducing higher order polynomials as features or reducing regularization rate if
you are using regularization. In some cases trying out with different training algorithm will work
fine.
What is Overfitting?
Overfitting is a situation where your model force learns the whole variance. Experts say it as
model starts to memorize all the noise instead of learning. A simple rule of thumb to identify the
overfitting is if your training loss is low and cross-validation loss is high then your model is
overfitting.
Uncleaned data, fewer steps in training, higher complexity of the model (due to higher weights in
data) can cause overfitting. It is always recommended to preprocess data and create a good data
pipeline. Select only necessary and meaningful features with good variance. Reduce the
complexity of the model using good regularization algorithm (L1 norm or L2 norm).
Comparison
Competitive Learning
Competitive learning is a concept in machine learning where models are trained to improve their
performance in competitive environments, such as online coding competitions, gaming, and
multi-agent systems. This approach enables models to adapt and learn from interactions with
other agents, users, or systems, balancing exploration for learning and competition for resources
or users.
One of the key challenges in competitive learning is finding the right balance between
exploration and exploitation. Exploration involves making suboptimal choices to acquire new
information, while exploitation focuses on making the best choices based on the current
knowledge. In competitive environments, learning algorithms must consider not only their own
performance but also the performance of other competing agents.
Recent research in competitive learning has explored various aspects of the field, such as
accelerating graph quantization, learning from source code competitions, and understanding the
impact of various parameters on learning processes in online coding competitions. These studies
have provided valuable insights into the nuances and complexities of competitive learning, as
well as the current challenges faced by researchers and practitioners.
Practical applications of competitive learning can be found in various domains, such as:
1. Online coding competitions: Competitive learning can help improve the performance of
participants by analyzing their behavior, approach, emotions, and problem difficulty levels.
2. Multi-agent systems: In settings where multiple agents interact and compete, competitive
learning can enable agents to adapt and cooperate more effectively.
3. Gaming: Competitive learning can be used to train game-playing agents to improve their
performance against human or AI opponents.
A company case study in competitive learning is the CodRep Machine Learning on Source Code
Competition, which aimed to create a common playground for machine learning and software
engineering research communities. The competition facilitated interaction between researchers
and practitioners, leading to advancements in the field.