UNIT-4
UNIT-4
Clustering Algorithms
Clustering in unsupervised machine learning is the process of grouping unlabeled data into
clusters based on their similarities. The goal of clustering is to identify patterns and
relationships in the data without any prior knowledge of the data’s meaning.
Broadly this technique is applied to group data based on different patterns, such as
similarities or differences, our machine model finds. These algorithms are used to process
raw, unclassified data objects into groups. For example, in the above figure, we have not
given output parameter values, so this technique will be used to group clients based on the
input parameters provided by our data.
Some common clustering algorithms:
K-means Clustering: Groups data into K clusters based on how close the points are to
each other.
Hierarchical Clustering: Creates clusters by building a tree step-by-step, either
merging or splitting groups.
Density-Based Clustering (DBSCAN): Finds clusters in dense areas and treats
scattered points as noise.
Mean-Shift Clustering: Discovers clusters by moving points toward the most crowded
areas.
Spectral Clustering: Groups data by analyzing connections between points using
graphs.
It is an iterative algorithm that divides the unlabeled dataset into k different clusters in such a
way that each dataset belongs only one group that has similar properties.
It allows us to cluster the data into different groups and a convenient way to discover the
categories of groups in the unlabeled dataset on its own without the need for any training.
It is a centroid-based algorithm, where each cluster is associated with a centroid. The main
aim of this algorithm is to minimize the sum of distances between the data point and their
corresponding clusters.
The algorithm takes the unlabeled dataset as input, divides the dataset into k-number of
clusters, and repeats the process until it does not find the best clusters. The value of k should
be predetermined in this algorithm.
The k-means clustering algorithm mainly performs two tasks:
o Determines the best value for K center points or centroids by an iterative process.
o Assigns each data point to its closest k-center. Those data points which are near to the
particular k-center, create a cluster.
Hence each cluster has datapoints with some commonalities, and it is away from other
clusters.
The below diagram explains the working of the K-means Clustering Algorithm:
K-Means Algorithm
The working of the K-Means algorithm is explained in the below steps:
Step-1: Select the number K to decide the number of clusters.
Step-2: Select random K points or centroids. (It can be other from the input dataset).
Step-3: Assign each data point to their closest centroid, which will form the predefined K
clusters.
Step-4: Calculate the variance and place a new centroid of each cluster.
Step-5: Repeat the third steps, which means reassign each datapoint to the new closest
centroid of each cluster.
Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.
Step-7: The model is ready.
Let's understand the above steps by considering the visual plots:
Suppose we have two variables M1 and M2. The x-y axis scatter plot of these two variables is
given below:
o Let's take number k of clusters, i.e., K=2, to identify the dataset and to put them into
different clusters. It means here we will try to group these datasets into two different
clusters.
o We need to choose some random k points or centroid to form the cluster. These points
can be either the points from the dataset or any other point. So, here we are selecting
the below two points as k points, which are not the part of our dataset. Consider the
below image:
o Now we will assign each data point of the scatter plot to its closest K-point or
centroid. We will compute it by applying some mathematics that we have studied to
calculate the distance between two points. So, we will draw a median between both
the centroids. Consider the below image:
From the above image, it is clear that points left side of the line is near to the K1 or blue
centroid, and points to the right of the line are close to the yellow centroid. Let's color them
as blue and yellow for clear visualization.
o As we need to find the closest cluster, so we will repeat the process by choosing a
new centroid. To choose the new centroids, we will compute the center of gravity of
these centroids, and will find new centroids as below:
o Next, we will reassign each datapoint to the new centroid. For this, we will repeat the
same process of finding a median line. The median will be like below image:
From the above image, we can see, one yellow point is on the left side of the line, and two
blue points are right to the line. So, these three points will be assigned to new centroids.
As reassignment has taken place, so we will again go to the step-4, which is finding new
centroids or K-points.
o We will repeat the process by finding the center of gravity of centroids, so the new
centroids will be as shown in the below image:
o As we got the new centroids so again will draw the median line and reassign the data
points. So, the image will be:
o We can see in the above image; there are no dissimilar data points on either side of the
line, which means our model is formed. Consider the below image:
As our model is ready, so we can now remove the assumed centroids, and the two final
clusters will be as shown in the below image:
Choosing the value of "K number of clusters" in K-means Clustering
The performance of the K-means clustering algorithm depends upon highly efficient clusters
that it forms. But choosing the optimal number of clusters is a big task. There are some
different ways to find the optimal number of clusters, but here we are discussing the most
appropriate method to find the number of clusters or value of K. The method is given below:
Elbow Method
The Elbow method is one of the most popular ways to find the optimal number of clusters.
This method uses the concept of WCSS value. WCSS stands for Within Cluster Sum of
Squares, which defines the total variations within a cluster. The formula to calculate the
value of WCSS (for 3 clusters) is given below:
WCSS= ∑Pi in Cluster1 distance(Pi C1)2 +∑Pi in Cluster2distance(Pi C2)2+∑Pi in CLuster3 distance(Pi C3)2
In the above formula of WCSS,
∑Pi in Cluster1 distance(Pi C1)2: It is the sum of the square of the distances between each data point
and its centroid within a cluster1 and the same for the other two terms.
To measure the distance between data points and centroid, we can use any method such as
Euclidean distance or Manhattan distance.
To find the optimal value of clusters, the elbow method follows the below steps:
o It executes the K-means clustering on a given dataset for different K values (ranges
from 1-10).
o For each value of K, calculates the WCSS value.
o Plots a curve between calculated WCSS values and the number of clusters K.
o The sharp point of bend or a point of the plot looks like an arm, then that point is
considered as the best value of K.
Since the graph shows the sharp bend, which looks like an elbow, hence it is known as the
elbow method. The graph for the elbow method looks like the below image:
Impact on Network Performance
Neural networks are machine learning models that mimic the complex functions of the
human brain. These models consist of interconnected nodes or neurons that process data,
learn patterns, and enable tasks such as pattern recognition and decision-making.
In this article, we will explore the fundamentals of neural networks, their architecture, how
they work, and their applications in various fields. Understanding neural networks is essential
for anyone interested in the advancements of artificial intelligence.
Neural networks are capable of learning and identifying patterns directly from data without
pre-defined rules. These networks are built from several key components:
1. Neurons: The basic units that receive inputs, each neuron is governed by a threshold
and an activation function.
2. Connections: Links between neurons that carry information, regulated by weights and
biases.
3. Weights and Biases: These parameters determine the strength and influence of
connections.
4. Propagation Functions: Mechanisms that help process and transfer data across layers
of neurons.
5. Learning Rule: The method that adjusts weights and biases over time to improve
accuracy.
Learning in neural networks follows a structured, three-stage process:
1. Input Computation: Data is fed into the network.
2. Output Generation: Based on the current parameters, the network generates an
output.
3. Iterative Refinement: The network refines its output by adjusting weights and biases,
gradually improving its performance on diverse tasks.
In an adaptive learning environment:
The neural network is exposed to a simulated scenario or dataset.
Parameters such as weights and biases are updated in response to new data or
conditions.
With each adjustment, the network’s response evolves, allowing it to adapt effectively
to different tasks or environments.
Importance of Neural Networks
Neural networks are pivotal in identifying complex patterns, solving intricate challenges, and
adapting to dynamic environments. Their ability to learn from vast amounts of data is
transformative, impacting technologies like natural language processing, self-driving
vehicles, and automated decision-making.
Neural networks streamline processes, increase efficiency, and support decision-making
across various industries. As a backbone of artificial intelligence, they continue to drive
innovation, shaping the future of technology.
Layers in Neural Network Architecture
1. Input Layer: This is where the network receives its input data. Each input neuron in
the layer corresponds to a feature in the input data.
2. Hidden Layers: These layers perform most of the computational heavy lifting. A
neural network can have one or multiple hidden layers. Each layer consists of units
(neurons) that transform the inputs into something that the output layer can use.
3. Output Layer: The final layer produces the output of the model. The format of these
outputs varies depending on the specific task (e.g., classification, regression).
Working of Neural Networks
Forward Propagation
When data is input into the network, it passes through the network in the forward direction,
from the input layer through the hidden layers to the output layer. This process is known as
forward propagation. Here’s what happens during this phase:
1. Linear Transformation: Each neuron in a layer receives inputs, which are multiplied
by the weights associated with the connections. These products are summed together,
and a bias is added to the sum. This can be represented mathematically
as: z=w1x1+w2x2+…+wnxn+bz=w1x1+w2x2+…+wnxn+b where ww represents the
weights, xx represents the inputs, and bb is the bias.
2. Activation: The result of the linear transformation (denoted as zz) is then passed
through an activation function. The activation function is crucial because it introduces
non-linearity into the system, enabling the network to learn more complex patterns.
Popular activation functions include ReLU, sigmoid, and tanh.
Backpropagation
After forward propagation, the network evaluates its performance using a loss function,
which measures the difference between the actual output and the predicted output. The goal
of training is to minimize this loss. This is where backpropagation comes into play:
1. Loss Calculation: The network calculates the loss, which provides a measure of error
in the predictions. The loss function could vary; common choices are mean squared
error for regression tasks or cross-entropy loss for classification.
2. Gradient Calculation: The network computes the gradients of the loss function with
respect to each weight and bias in the network. This involves applying the chain rule
of calculus to find out how much each part of the output error can be attributed to
each weight and bias.
3. Weight Update: Once the gradients are calculated, the weights and biases are
updated using an optimization algorithm like stochastic gradient descent (SGD). The
weights are adjusted in the opposite direction of the gradient to minimize the loss. The
size of the step taken in each update is determined by the learning rate.
Iteration
This process of forward propagation, loss calculation, backpropagation, and weight update is
repeated for many iterations over the dataset. Over time, this iterative process reduces the
loss, and the network’s predictions become more accurate.
Through these steps, neural networks can adapt their parameters to better approximate the
relationships in the data, thereby improving their performance on tasks such as classification,
regression, or any other predictive modeling.
Learning with Supervised Learning
In supervised learning, a neural network learns from labeled input-output pairs provided by a
teacher. The network generates outputs based on inputs, and by comparing these outputs to
the known desired outputs, an error signal is created. The network iteratively adjusts its
parameters to minimize errors until it reaches an acceptable performance level.
Types of Neural Networks
There are seven types of neural networks that can be used.
Feedforward Networks: A feedforward neural network is a simple artificial neural
network architecture in which data moves from input to output in a single direction.
Singlelayer Perceptron: A single-layer perceptron consists of only one layer of
neurons . It takes inputs, applies weights, sums them up, and uses an activation
function to produce an output.
Multilayer Perceptron (MLP): MLP is a type of feedforward neural network with
three or more layers, including an input layer, one or more hidden layers, and an
output layer. It uses nonlinear activation functions.
Convolutional Neural Network (CNN): A Convolutional Neural Network (CNN) is
a specialized artificial neural network designed for image processing. It employs
convolutional layers to automatically learn hierarchical features from input images,
enabling effective image recognition and classification.
Recurrent Neural Network (RNN): An artificial neural network type intended for
sequential data processing is called a Recurrent Neural Network (RNN). It is
appropriate for applications where contextual dependencies are critical, such as time
series prediction and natural language processing, since it makes use of feedback
loops, which enable information to survive within the network.
Long Short-Term Memory (LSTM): LSTM is a type of RNN that is designed to
overcome the vanishing gradient problem in training RNNs. It uses memory cells and
gates to selectively read, write, and erase information.
Advantages of Neural Networks
Neural networks are widely used in many different applications because of their many
benefits:
Adaptability: Neural networks are useful for activities where the link between inputs
and outputs is complex or not well defined because they can adapt to new situations
and learn from data.
Pattern Recognition: Their proficiency in pattern recognition renders them
efficacious in tasks like as audio and image identification, natural language
processing, and other intricate data patterns.
Parallel Processing: Because neural networks are capable of parallel processing by
nature, they can process numerous jobs at once, which speeds up and improves the
efficiency of computations.
Non-Linearity: Neural networks are able to model and comprehend complicated
relationships in data by virtue of the non-linear activation functions found in neurons,
which overcome the drawbacks of linear models.
Disadvantages of Neural Networks
Neural networks, while powerful, are not without drawbacks and difficulties:
Computational Intensity: Large neural network training can be a laborious and
computationally demanding process that demands a lot of computing power.
Black box Nature: As “black box” models, neural networks pose a problem in
important applications since it is difficult to understand how they make decisions.
Overfitting: Overfitting is a phenomenon in which neural networks commit training
material to memory rather than identifying patterns in the data. Although
regularization approaches help to alleviate this, the problem still exists.
Need for Large datasets: For efficient training, neural networks frequently need
sizable, labeled datasets; otherwise, their performance may suffer from incomplete or
skewed data.
3.Analyze how linear regression is used in signal prediction for communication systems
and evaluate its effectiveness in minimizing transmission errors.
4. Compare and analyze the role of logistic regression and decision trees in classification
tasks within wireless communication networks. Which method provides better
accuracy and why?
Logistic regression is used for binary classification where we use sigmoid function, that takes
input as independent variables and produces a probability value between 0 and 1.
For example, we have two classes Class 0 and Class 1 if the value of the logistic function for
an input is greater than 0.5 (threshold value) then it belongs to Class 1 otherwise it belongs to
Class 0. It’s referred to as regression because it is the extension of linear regression but is
mainly used for classification problems.
Key Points:
Logistic regression predicts the output of a categorical dependent variable. Therefore,
the outcome must be a categorical or discrete value.
It can be either Yes or No, 0 or 1, true or False, etc. but instead of giving the exact
value as 0 and 1, it gives the probabilistic values which lie between 0 and 1.
In Logistic regression, instead of fitting a regression line, we fit an “S” shaped logistic
function, which predicts two maximum values (0 or 1).
Types of Logistic Regression
On the basis of the categories, Logistic Regression can be classified into three types:
1. Binomial: In binomial Logistic regression, there can be only two possible types of the
dependent variables, such as 0 or 1, Pass or Fail, etc.
2. Multinomial: In multinomial Logistic regression, there can be 3 or more possible
unordered types of the dependent variable, such as “cat”, “dogs”, or “sheep”
3. Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered types
of dependent variables, such as “low”, “Medium”, or “High”.
Assumptions of Logistic Regression
We will explore the assumptions of logistic regression as understanding these assumptions is
important to ensure that we are using appropriate application of the model. The assumption
include:
1. Independent observations: Each observation is independent of the other. meaning
there is no correlation between any input variables.
2. Binary dependent variables: It takes the assumption that the dependent variable must
be binary or dichotomous, meaning it can take only two values. For more than two
categories SoftMax functions are used.
3. Linearity relationship between independent variables and log odds: The relationship
between the independent variables and the log odds of the dependent variable should
be linear.
4. No outliers: There should be no outliers in the dataset.
5. Large sample size: The sample size is sufficiently large
Sigmoid Function
So far, we’ve covered the basics of logistic regression, but now let’s focus on the most
important function that forms the core of logistic regression.
The sigmoid function is a mathematical function used to map the predicted values to
probabilities.
It maps any real value into another value within a range of 0 and 1. The value of the
logistic regression must be between 0 and 1, which cannot go beyond this limit, so it
forms a curve like the “S” form.
The S-form curve is called the Sigmoid function or the logistic function.
Working of Logistic Regression:
The logistic regression model transforms the linear regression function continuous value
output into categorical value output using a sigmoid function, which maps any real-valued set
of independent variables input into a value between 0 and 1. This function is known as the
logistic function.
Evaluation in Logistic Regression Model:
So far, we’ve covered the implementation of logistic regression. Now, let’s dive into the
evaluation of logistic regression and understand why it’s important
Evaluating the model helps us assess the model’s performance and ensure it generalizes well
to new data
We can evaluate the logistic regression model using the following metrics:
Accuracy: Accuracy provides the proportion of correctly classified instances.
Accuracy = (TruePositives + TrueNegatives) / Total
Precision: Precision focuses on the accuracy of positive predictions.
Precision = TruePositives / (TruePositives + FalsePositives)
Recall (Sensitivity or True Positive Rate): Recall measures the proportion of
correctly predicted positive instances among all actual positive instances.
Recall = TruePositives / (TruePositives + FalseNegatives)
Area Under the Receiver Operating Characteristic Curve (AUC-ROC): The ROC
curve plots the true positive rate against the false positive rate at various
thresholds. AUC-ROC measures the area under this curve, providing an aggregate
measure of a model’s performance across different classification thresholds.
Area Under the Precision-Recall Curve (AUC-PR): Similar to AUC-ROC, AUC-
PR measures the area under the precision-recall curve, providing a summary of a
model’s performance across different precision-recall trade-offs.
The learning algorithm has no reason to include tests for Raining and Reservation, because it
can classify all the examples without them. It has also detected an interesting and previously
unsuspected pattern: SR will wait for Thai food on weekends. It is also bound to make some
mistakes for cases where it has seen no examples.
We can evaluate the performance of a learning algorithm with a learning curve, as shown
Learning curve in Figure 19.7. For this figure we have 100 examples at our disposal, which
we split randomly into a training set and a test set. We learn a hypothesis h with the training
set and measure its accuracy with the test set. We can do this starting with a training set of
size 1 and increasing one at a time up to size 99. For each size, we actually repeat the process
of randomly splitting into training and test sets 20 times, and average the results of the 20
trials. The curve shows that as the training set size grows, the accuracy increases. (For this
reason, learning curves are also called happy graphs.) In this graph we reach 95% accuracy,
and it looks as if the Happy graphs curve might continue to increase if we had more data.
For linearly separable problems (e.g., binary signal detection, basic spectrum
sensing), Logistic Regression performs well.
For complex, nonlinear classification tasks (e.g., modulation classification,
interference detection), Decision Trees outperform Logistic Regression due to
their ability to capture intricate relationships.
Decision Trees are preferred when feature interactions are crucial and data has
a nonlinear structure.
Logistic Regression is better suited for simpler problems where interpretability
and computational efficiency are critical.