0% found this document useful (0 votes)
10 views

UNIT-4

This document discusses machine learning techniques, specifically clustering algorithms like K-means and the role of neural networks in communication systems. K-means clustering optimizes dynamic spectrum management by improving spectrum utilization, lowering latency, enhancing quality of service, and increasing energy efficiency, though it has limitations such as requiring a predefined number of clusters. Neural networks are highlighted for their ability to adaptively modulate wireless communication, improving system performance through learning and pattern recognition.

Uploaded by

elayaraja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

UNIT-4

This document discusses machine learning techniques, specifically clustering algorithms like K-means and the role of neural networks in communication systems. K-means clustering optimizes dynamic spectrum management by improving spectrum utilization, lowering latency, enhancing quality of service, and increasing energy efficiency, though it has limitations such as requiring a predefined number of clusters. Neural networks are highlighted for their ability to adaptively modulate wireless communication, improving system performance through learning and pattern recognition.

Uploaded by

elayaraja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 22

UNIT IV

MACHINE LEARNING TECHNIQUES FOR


COMMUNICATION SYSTEMS
1. Analyze the effectiveness of clustering algorithms like K-means in optimizing
dynamic spectrum management and discuss their impact on network performance.

Clustering Algorithms
Clustering in unsupervised machine learning is the process of grouping unlabeled data into
clusters based on their similarities. The goal of clustering is to identify patterns and
relationships in the data without any prior knowledge of the data’s meaning.
Broadly this technique is applied to group data based on different patterns, such as
similarities or differences, our machine model finds. These algorithms are used to process
raw, unclassified data objects into groups. For example, in the above figure, we have not
given output parameter values, so this technique will be used to group clients based on the
input parameters provided by our data.
Some common clustering algorithms:
 K-means Clustering: Groups data into K clusters based on how close the points are to
each other.
 Hierarchical Clustering: Creates clusters by building a tree step-by-step, either
merging or splitting groups.
 Density-Based Clustering (DBSCAN): Finds clusters in dense areas and treats
scattered points as noise.
 Mean-Shift Clustering: Discovers clusters by moving points toward the most crowded
areas.
 Spectral Clustering: Groups data by analyzing connections between points using
graphs.

K-Means Clustering Algorithm


K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled dataset into
different clusters. Here K defines the number of pre-defined clusters that need to be created in the
process, as if K=2, there will be two clusters, and for K=3, there will be three clusters, and so on.

It is an iterative algorithm that divides the unlabeled dataset into k different clusters in such a
way that each dataset belongs only one group that has similar properties.
It allows us to cluster the data into different groups and a convenient way to discover the
categories of groups in the unlabeled dataset on its own without the need for any training.
It is a centroid-based algorithm, where each cluster is associated with a centroid. The main
aim of this algorithm is to minimize the sum of distances between the data point and their
corresponding clusters.
The algorithm takes the unlabeled dataset as input, divides the dataset into k-number of
clusters, and repeats the process until it does not find the best clusters. The value of k should
be predetermined in this algorithm.
The k-means clustering algorithm mainly performs two tasks:
o Determines the best value for K center points or centroids by an iterative process.
o Assigns each data point to its closest k-center. Those data points which are near to the
particular k-center, create a cluster.
Hence each cluster has datapoints with some commonalities, and it is away from other
clusters.
The below diagram explains the working of the K-means Clustering Algorithm:

K-Means Algorithm
The working of the K-Means algorithm is explained in the below steps:
Step-1: Select the number K to decide the number of clusters.
Step-2: Select random K points or centroids. (It can be other from the input dataset).
Step-3: Assign each data point to their closest centroid, which will form the predefined K
clusters.
Step-4: Calculate the variance and place a new centroid of each cluster.
Step-5: Repeat the third steps, which means reassign each datapoint to the new closest
centroid of each cluster.
Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.
Step-7: The model is ready.
Let's understand the above steps by considering the visual plots:
Suppose we have two variables M1 and M2. The x-y axis scatter plot of these two variables is
given below:
o Let's take number k of clusters, i.e., K=2, to identify the dataset and to put them into
different clusters. It means here we will try to group these datasets into two different
clusters.
o We need to choose some random k points or centroid to form the cluster. These points
can be either the points from the dataset or any other point. So, here we are selecting
the below two points as k points, which are not the part of our dataset. Consider the
below image:

o Now we will assign each data point of the scatter plot to its closest K-point or
centroid. We will compute it by applying some mathematics that we have studied to
calculate the distance between two points. So, we will draw a median between both
the centroids. Consider the below image:

From the above image, it is clear that points left side of the line is near to the K1 or blue
centroid, and points to the right of the line are close to the yellow centroid. Let's color them
as blue and yellow for clear visualization.
o As we need to find the closest cluster, so we will repeat the process by choosing a
new centroid. To choose the new centroids, we will compute the center of gravity of
these centroids, and will find new centroids as below:

o Next, we will reassign each datapoint to the new centroid. For this, we will repeat the
same process of finding a median line. The median will be like below image:

From the above image, we can see, one yellow point is on the left side of the line, and two
blue points are right to the line. So, these three points will be assigned to new centroids.
As reassignment has taken place, so we will again go to the step-4, which is finding new
centroids or K-points.
o We will repeat the process by finding the center of gravity of centroids, so the new
centroids will be as shown in the below image:

o As we got the new centroids so again will draw the median line and reassign the data
points. So, the image will be:

o We can see in the above image; there are no dissimilar data points on either side of the
line, which means our model is formed. Consider the below image:

As our model is ready, so we can now remove the assumed centroids, and the two final
clusters will be as shown in the below image:
Choosing the value of "K number of clusters" in K-means Clustering
The performance of the K-means clustering algorithm depends upon highly efficient clusters
that it forms. But choosing the optimal number of clusters is a big task. There are some
different ways to find the optimal number of clusters, but here we are discussing the most
appropriate method to find the number of clusters or value of K. The method is given below:
Elbow Method
The Elbow method is one of the most popular ways to find the optimal number of clusters.
This method uses the concept of WCSS value. WCSS stands for Within Cluster Sum of
Squares, which defines the total variations within a cluster. The formula to calculate the
value of WCSS (for 3 clusters) is given below:
WCSS= ∑Pi in Cluster1 distance(Pi C1)2 +∑Pi in Cluster2distance(Pi C2)2+∑Pi in CLuster3 distance(Pi C3)2
In the above formula of WCSS,
∑Pi in Cluster1 distance(Pi C1)2: It is the sum of the square of the distances between each data point
and its centroid within a cluster1 and the same for the other two terms.
To measure the distance between data points and centroid, we can use any method such as
Euclidean distance or Manhattan distance.
To find the optimal value of clusters, the elbow method follows the below steps:
o It executes the K-means clustering on a given dataset for different K values (ranges
from 1-10).
o For each value of K, calculates the WCSS value.
o Plots a curve between calculated WCSS values and the number of clusters K.
o The sharp point of bend or a point of the plot looks like an arm, then that point is
considered as the best value of K.
Since the graph shows the sharp bend, which looks like an elbow, hence it is known as the
elbow method. The graph for the elbow method looks like the below image:
Impact on Network Performance

1. Improved Spectrum Utilization


o By clustering users with similar spectrum requirements, K-means reduces
wastage of spectrum resources and ensures optimal usage.
2. Lower Latency
o Efficient clustering minimizes unnecessary spectrum switching, leading to
reduced latency in data transmission.
3. Enhanced Quality of Service (QoS)
o Optimized clustering enhances QoS by reducing interference, improving
throughput, and ensuring seamless connectivity.
4. Better Energy Efficiency
o Cluster-based spectrum allocation reduces the power consumption of base
stations by minimizing redundant transmissions.
5. Limitations and Challenges
o Fixed number of clusters: K-means requires predefining the number of
clusters, which may not be optimal in dynamic networks.
o Sensitivity to Initialization: Poor initialization can lead to suboptimal
clustering.
o Difficulty with Non-Spherical Data: K-means assumes spherical clusters,
which may not always align with real-world network topology.

2. Analyze the role of neural networks in adaptive modulation for wireless


communication and evaluate their impact on system performance and efficiency.

Neural networks are machine learning models that mimic the complex functions of the
human brain. These models consist of interconnected nodes or neurons that process data,
learn patterns, and enable tasks such as pattern recognition and decision-making.
In this article, we will explore the fundamentals of neural networks, their architecture, how
they work, and their applications in various fields. Understanding neural networks is essential
for anyone interested in the advancements of artificial intelligence.
Neural networks are capable of learning and identifying patterns directly from data without
pre-defined rules. These networks are built from several key components:
1. Neurons: The basic units that receive inputs, each neuron is governed by a threshold
and an activation function.
2. Connections: Links between neurons that carry information, regulated by weights and
biases.
3. Weights and Biases: These parameters determine the strength and influence of
connections.
4. Propagation Functions: Mechanisms that help process and transfer data across layers
of neurons.
5. Learning Rule: The method that adjusts weights and biases over time to improve
accuracy.
Learning in neural networks follows a structured, three-stage process:
1. Input Computation: Data is fed into the network.
2. Output Generation: Based on the current parameters, the network generates an
output.
3. Iterative Refinement: The network refines its output by adjusting weights and biases,
gradually improving its performance on diverse tasks.
In an adaptive learning environment:
 The neural network is exposed to a simulated scenario or dataset.
 Parameters such as weights and biases are updated in response to new data or
conditions.
 With each adjustment, the network’s response evolves, allowing it to adapt effectively
to different tasks or environments.
Importance of Neural Networks
Neural networks are pivotal in identifying complex patterns, solving intricate challenges, and
adapting to dynamic environments. Their ability to learn from vast amounts of data is
transformative, impacting technologies like natural language processing, self-driving
vehicles, and automated decision-making.
Neural networks streamline processes, increase efficiency, and support decision-making
across various industries. As a backbone of artificial intelligence, they continue to drive
innovation, shaping the future of technology.
Layers in Neural Network Architecture
1. Input Layer: This is where the network receives its input data. Each input neuron in
the layer corresponds to a feature in the input data.
2. Hidden Layers: These layers perform most of the computational heavy lifting. A
neural network can have one or multiple hidden layers. Each layer consists of units
(neurons) that transform the inputs into something that the output layer can use.
3. Output Layer: The final layer produces the output of the model. The format of these
outputs varies depending on the specific task (e.g., classification, regression).
Working of Neural Networks
Forward Propagation
When data is input into the network, it passes through the network in the forward direction,
from the input layer through the hidden layers to the output layer. This process is known as
forward propagation. Here’s what happens during this phase:
1. Linear Transformation: Each neuron in a layer receives inputs, which are multiplied
by the weights associated with the connections. These products are summed together,
and a bias is added to the sum. This can be represented mathematically
as: z=w1x1+w2x2+…+wnxn+bz=w1x1+w2x2+…+wnxn+b where ww represents the
weights, xx represents the inputs, and bb is the bias.
2. Activation: The result of the linear transformation (denoted as zz) is then passed
through an activation function. The activation function is crucial because it introduces
non-linearity into the system, enabling the network to learn more complex patterns.
Popular activation functions include ReLU, sigmoid, and tanh.
Backpropagation
After forward propagation, the network evaluates its performance using a loss function,
which measures the difference between the actual output and the predicted output. The goal
of training is to minimize this loss. This is where backpropagation comes into play:
1. Loss Calculation: The network calculates the loss, which provides a measure of error
in the predictions. The loss function could vary; common choices are mean squared
error for regression tasks or cross-entropy loss for classification.
2. Gradient Calculation: The network computes the gradients of the loss function with
respect to each weight and bias in the network. This involves applying the chain rule
of calculus to find out how much each part of the output error can be attributed to
each weight and bias.
3. Weight Update: Once the gradients are calculated, the weights and biases are
updated using an optimization algorithm like stochastic gradient descent (SGD). The
weights are adjusted in the opposite direction of the gradient to minimize the loss. The
size of the step taken in each update is determined by the learning rate.
Iteration
This process of forward propagation, loss calculation, backpropagation, and weight update is
repeated for many iterations over the dataset. Over time, this iterative process reduces the
loss, and the network’s predictions become more accurate.
Through these steps, neural networks can adapt their parameters to better approximate the
relationships in the data, thereby improving their performance on tasks such as classification,
regression, or any other predictive modeling.
Learning with Supervised Learning
In supervised learning, a neural network learns from labeled input-output pairs provided by a
teacher. The network generates outputs based on inputs, and by comparing these outputs to
the known desired outputs, an error signal is created. The network iteratively adjusts its
parameters to minimize errors until it reaches an acceptable performance level.
Types of Neural Networks
There are seven types of neural networks that can be used.
 Feedforward Networks: A feedforward neural network is a simple artificial neural
network architecture in which data moves from input to output in a single direction.
 Singlelayer Perceptron: A single-layer perceptron consists of only one layer of
neurons . It takes inputs, applies weights, sums them up, and uses an activation
function to produce an output.
 Multilayer Perceptron (MLP): MLP is a type of feedforward neural network with
three or more layers, including an input layer, one or more hidden layers, and an
output layer. It uses nonlinear activation functions.
 Convolutional Neural Network (CNN): A Convolutional Neural Network (CNN) is
a specialized artificial neural network designed for image processing. It employs
convolutional layers to automatically learn hierarchical features from input images,
enabling effective image recognition and classification.
 Recurrent Neural Network (RNN): An artificial neural network type intended for
sequential data processing is called a Recurrent Neural Network (RNN). It is
appropriate for applications where contextual dependencies are critical, such as time
series prediction and natural language processing, since it makes use of feedback
loops, which enable information to survive within the network.
 Long Short-Term Memory (LSTM): LSTM is a type of RNN that is designed to
overcome the vanishing gradient problem in training RNNs. It uses memory cells and
gates to selectively read, write, and erase information.
Advantages of Neural Networks
Neural networks are widely used in many different applications because of their many
benefits:
 Adaptability: Neural networks are useful for activities where the link between inputs
and outputs is complex or not well defined because they can adapt to new situations
and learn from data.
 Pattern Recognition: Their proficiency in pattern recognition renders them
efficacious in tasks like as audio and image identification, natural language
processing, and other intricate data patterns.
 Parallel Processing: Because neural networks are capable of parallel processing by
nature, they can process numerous jobs at once, which speeds up and improves the
efficiency of computations.
 Non-Linearity: Neural networks are able to model and comprehend complicated
relationships in data by virtue of the non-linear activation functions found in neurons,
which overcome the drawbacks of linear models.
Disadvantages of Neural Networks
Neural networks, while powerful, are not without drawbacks and difficulties:
 Computational Intensity: Large neural network training can be a laborious and
computationally demanding process that demands a lot of computing power.
 Black box Nature: As “black box” models, neural networks pose a problem in
important applications since it is difficult to understand how they make decisions.
 Overfitting: Overfitting is a phenomenon in which neural networks commit training
material to memory rather than identifying patterns in the data. Although
regularization approaches help to alleviate this, the problem still exists.
 Need for Large datasets: For efficient training, neural networks frequently need
sizable, labeled datasets; otherwise, their performance may suffer from incomplete or
skewed data.

How Neural Networks Improve Adaptive Modulation

1. Channel Prediction and Estimation


o NNs, particularly deep learning models (CNNs, LSTMs, and RNNs), can analyze
historical channel conditions and predict future channel states.
o This predictive capability allows modulation schemes to be adjusted before
degradation occurs, improving robustness.
2. Fast and Accurate Decision-Making
o Traditional AM relies on predefined thresholds of signal-to-noise ratio (SNR), which
may not be optimal in non-stationary environments.
o Neural networks dynamically adjust thresholds based on real-time conditions,
reducing latency and improving efficiency.
3. Nonlinear Channel Adaptation
o Wireless channels are affected by multipath fading, Doppler shifts, and
interference.
o Unlike classical algorithms that assume linearity, deep NNs can learn non-linear
relationships and optimize modulation accordingly.
4. Reduction in Bit Error Rate (BER)
o By intelligently switching modulation schemes, NN-based AM minimizes BER while
maximizing throughput.
o It ensures that lower-order modulations (e.g., BPSK, QPSK) are used in poor
channel conditions while switching to higher-order modulations (e.g., 16-QAM, 64-
QAM) in good conditions.
5. Energy Efficiency Optimization
o NN-based AM reduces power consumption by minimizing unnecessary high-power
transmissions.
o It enables green communication by optimizing transmission power and modulation
selection based on real-time demand.

Impact on System Performance and Efficiency


Metric Traditional Adaptive Modulation NN-Based Adaptive Modulation

Latency High (Threshold-based) Low (Real-time optimization)

Bit Error Rate (BER) Moderate Lower (Optimized selection)

Spectral Efficiency Limited (Fixed rules) Higher (Dynamic adaptation)

Energy Efficiency Suboptimal Improved (Adaptive power control)

Robustness Weak in fast-varying channels Strong (Deep learning models)

Challenges and Considerations

 Computational Complexity: Implementing deep learning models requires higher processing


power, which may be a concern for low-power IoT devices.
 Data Requirements: Training NNs requires large datasets of channel conditions and
modulation outcomes, making real-world deployment complex.
 Generalization: Overfitting to specific environments may reduce adaptability in different
network conditions. Transfer learning and reinforcement learning (RL) approaches can
address this issue.

3.Analyze how linear regression is used in signal prediction for communication systems
and evaluate its effectiveness in minimizing transmission errors.

4.2.1 Linear Regression


Linear regression is also a type of supervised machine-learning algorithm that learns from the
labelled datasets and maps the data points with most optimized linear functions which can be
used for prediction on new datasets. It computes the linear relationship between the
dependent variable and one or more independent features by fitting a linear equation with
observed data. It predicts the continuous output variables based on the independent input
variable.
For example if we want to predict house price we consider various factor such as house age,
distance from the main road, location, area and number of room, linear regression uses all
these parameter to predict house price as it consider a linear relation between all these
features and price of house.
A sloped straight line represents the linear regression model.
Best Fit Line for a Linear Regression Model
In the above figure,
X-axis = Independent variable
Y-axis = Output / dependent variable
Line of regression = Best fit line for a model
Here, a line is plotted for the given data points that suitably fit all the issues. Hence, it is
called the ‘best fit line.’ The goal of the linear regression algorithm is to find this best fit line
seen in the above figure.
Key benefits of linear regression
Linear regression is a popular statistical tool used in data science, thanks to the several
benefits it offers, such as:
1. Easy implementation
The linear regression model is computationally simple to implement as it does not demand a
lot of engineering overheads, neither before the model launch nor during its maintenance.
2. Interpretability
Unlike other deep learning models (neural networks), linear regression is relatively
straightforward. As a result, this algorithm stands ahead of black-box models that fall short in
justifying which input variable causes the output variable to change.
3. Scalability
Linear regression is not computationally heavy and, therefore, fits well in cases where scaling
is essential. For example, the model can scale well regarding increased data volume (big
data).
4. Optimal for online settings
The ease of computation of these algorithms allows them to be used in online settings. The
model can be trained and retrained with each new example to generate predictions in real-
time, unlike the neural networks or support vector machines that are computationally heavy
and require plenty of computing resources and substantial waiting time to retrain on a new
dataset.
Linear Regression Equation
Let’s consider a dataset that covers RAM sizes and their corresponding costs.
In this case, the dataset comprises two distinct features: memory (capacity) and cost. The
more RAM, the more the purchase cost of RAMs.

Dataset: RAM Capacity vs. Cost


If we plot RAM on the X-axis and its cost on the Y-axis, a line from the lower-left corner of
the graph to the upper right represents the relationship between X and Y. On plotting these
data points on a scatter plot, we get the following graph:

Scatter Plot: PAM Capacity vs. Cost


The memory-to-cost ratio may vary according to different manufacturers and RAM versions,
but the data trend shows a pattern. The data on the bottom left shows cheaper RAMs with
smaller memory, and the line continues to the upper right corner of the graph, where the
RAMs are of higher capacity and are costly).
The regression model defines a linear function between the X and Y variables that best
showcases the relationship between the two. It is represented by the slant line seen in the
above figure, where the objective is to determine an optimal ‘regression line’ that best fits all
the individual data points.
Mathematically these slant lines follow the following equation,
Y = m*X + b
Where X = dependent variable (target)
Y = independent variable
m = slope of the line (slope is definexperts have a different notation to the above slope-line
equation,
y(x) = p0 + p1 * x
where,
 y = output variable. Variable y represents the continuous value that the model tries to
predict.
 x = input variable. In machine learning, x is the feature, while it is termed the
independent variable in statistics. Variable x represents the input information provided
to the model at any given time.
 p0 = y-axis intercept (or the bias term).
 p1 = the regression coefficient or scale factor. In classical statistics, p1 is the
equivalent of the slope of the best-fit straight line of the linear regression model.
 pi = weights (in general).
Thus, regression modeling is all about finding the values for the unknown parameters of the
equation, i.e., values for p0 and p1 (weights).
The equation for multiple linear regression
The above process applies to simple linear regression having a single feature or independent
variable. However, a regression model can be used for multiple features by extending the
equation for the number of variables available within the dataset.
The equation for multiple linear regression is similar to the equation for a simple linear
equation, i.e., y(x) = p0 + p1x1 plus the additional weights and inputs for the different
features which are represented by p(n)x(n). The formula for multiple linear regression would
look like,
y(x) = p0 + p1x1 + p2x2 + … + p(n)x(n)
The machine learning model uses the above formula and different weight values to draw lines
to fit. Moreover, to determine the line best fits the data, the model evaluates different weight
combinations that best fit the data and establishes a strong relationship between the variables.
Furthermore, along with the prediction function, the regression model uses a cost function to
optimize the weights (pi). The cost function of linear regression is the root mean squared
error or mean squared error (MSE).
Fundamentally, MSE measures the average squared difference between the observation’s
actual and predicted values. The output is the cost or score associated with the current set of
weights and is generally a single number. The objective here is to minimize MSE to boost the
accuracy of the regression model.
Math
Given the simple linear equation y=mx+b, we can calculate the MSE values:

Equation to Calculate MSE Values


Where,
 N = total number of observations (data points)
 1/N∑ni=1 = mean
 yi = actual value of an observation
 mxi+b = prediction
Along with the cost function, a ‘Gradient Descent’ algorithm is used to minimize MSE and
find the best-fit line for a given training dataset in fewer iterations, thereby improving the
overall efficiency of the regression model.
The equation for linear regression can be visualized as:

Visualization of Equation for Linear Regression


Types of Linear Regression:
Linear regression has been a critical driving force behind many AI and data
science applications. The types of linear regression models include:
1. Simple linear regression
Simple linear regression reveals the correlation between a dependent variable (input) and an
independent variable (output). Primarily, this regression type describes the following:
 Relationship strength between the given variables.
2. Multiple linear regression
Multiple linear regression establishes the relationship between independent variables (two or
more) and the corresponding dependent variable. Here, the independent variables can be
either continuous or categorical. This regression type helps foresee trends, determine future
values, and predict the impacts of changes.
3. Logistic regression
Logistic regression—also referred to as the logit model—is applicable in cases where there is
one dependent variable and more independent variables. The fundamental difference between
multiple and logistic regression is that the target variable in the logistic approach is discrete
(binary or an ordinal value). Implying, the dependent variable is finite or categorical–either P
or Q (binary regression) or a range of limited options P, Q, R, or S.
4. Ordinal regression
Ordinal regression involves one dependent dichotomous variable and one independent
variable, which can either be ordinal or nominal. It facilitates the interaction between
dependent variables with multiple ordered levels with one or more independent variables.
5. Multinomial logistic regression
Multinomial logistic regression (MLR) is performed when the dependent variable is nominal
with more than two levels. It specifies the relationship between one dependent nominal
variable and one or more continuous-level (interval, ratio, or dichotomous) independent
variables. Here, the nominal variable refers to a variable with no intrinsic ordering.

4. Compare and analyze the role of logistic regression and decision trees in classification
tasks within wireless communication networks. Which method provides better
accuracy and why?

4.2.2 Logistic Regression


Logistic regression is a supervised machine learning algorithm used for classification tasks where the
goal is to predict the probability that an instance belongs to a given class or not. Logistic regression is
a statistical algorithm which analyze the relationship between two data factors. The article explores
the fundamentals of logistic regression, it’s types and implementations.

Logistic regression is used for binary classification where we use sigmoid function, that takes
input as independent variables and produces a probability value between 0 and 1.
For example, we have two classes Class 0 and Class 1 if the value of the logistic function for
an input is greater than 0.5 (threshold value) then it belongs to Class 1 otherwise it belongs to
Class 0. It’s referred to as regression because it is the extension of linear regression but is
mainly used for classification problems.
Key Points:
 Logistic regression predicts the output of a categorical dependent variable. Therefore,
the outcome must be a categorical or discrete value.
 It can be either Yes or No, 0 or 1, true or False, etc. but instead of giving the exact
value as 0 and 1, it gives the probabilistic values which lie between 0 and 1.
 In Logistic regression, instead of fitting a regression line, we fit an “S” shaped logistic
function, which predicts two maximum values (0 or 1).
Types of Logistic Regression
On the basis of the categories, Logistic Regression can be classified into three types:
1. Binomial: In binomial Logistic regression, there can be only two possible types of the
dependent variables, such as 0 or 1, Pass or Fail, etc.
2. Multinomial: In multinomial Logistic regression, there can be 3 or more possible
unordered types of the dependent variable, such as “cat”, “dogs”, or “sheep”
3. Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered types
of dependent variables, such as “low”, “Medium”, or “High”.
Assumptions of Logistic Regression
We will explore the assumptions of logistic regression as understanding these assumptions is
important to ensure that we are using appropriate application of the model. The assumption
include:
1. Independent observations: Each observation is independent of the other. meaning
there is no correlation between any input variables.
2. Binary dependent variables: It takes the assumption that the dependent variable must
be binary or dichotomous, meaning it can take only two values. For more than two
categories SoftMax functions are used.
3. Linearity relationship between independent variables and log odds: The relationship
between the independent variables and the log odds of the dependent variable should
be linear.
4. No outliers: There should be no outliers in the dataset.
5. Large sample size: The sample size is sufficiently large
Sigmoid Function
So far, we’ve covered the basics of logistic regression, but now let’s focus on the most
important function that forms the core of logistic regression.
 The sigmoid function is a mathematical function used to map the predicted values to
probabilities.
 It maps any real value into another value within a range of 0 and 1. The value of the
logistic regression must be between 0 and 1, which cannot go beyond this limit, so it
forms a curve like the “S” form.
 The S-form curve is called the Sigmoid function or the logistic function.
Working of Logistic Regression:
The logistic regression model transforms the linear regression function continuous value
output into categorical value output using a sigmoid function, which maps any real-valued set
of independent variables input into a value between 0 and 1. This function is known as the
logistic function.
Evaluation in Logistic Regression Model:
So far, we’ve covered the implementation of logistic regression. Now, let’s dive into the
evaluation of logistic regression and understand why it’s important
Evaluating the model helps us assess the model’s performance and ensure it generalizes well
to new data
We can evaluate the logistic regression model using the following metrics:
 Accuracy: Accuracy provides the proportion of correctly classified instances.
Accuracy = (TruePositives + TrueNegatives) / Total
 Precision: Precision focuses on the accuracy of positive predictions.
Precision = TruePositives / (TruePositives + FalsePositives)
 Recall (Sensitivity or True Positive Rate): Recall measures the proportion of
correctly predicted positive instances among all actual positive instances.
Recall = TruePositives / (TruePositives + FalseNegatives)

F1Score = 2 ∗ ((Precision ∗ Recall) / (Precision + Recall))


 F1 Score: F1 score is the harmonic mean of precision and recall.

 Area Under the Receiver Operating Characteristic Curve (AUC-ROC): The ROC
curve plots the true positive rate against the false positive rate at various
thresholds. AUC-ROC measures the area under this curve, providing an aggregate
measure of a model’s performance across different classification thresholds.
 Area Under the Precision-Recall Curve (AUC-PR): Similar to AUC-ROC, AUC-
PR measures the area under the precision-recall curve, providing a summary of a
model’s performance across different precision-recall trade-offs.

4.2.3 Decision Trees


The LEARN-DECISION-TREE algorithm adopts a greedy divide-and-conquer strategy:
always test the most important attribute first, then recursively solve the smaller subproblems
that are defined by the possible results of the test. By “most important attribute,” we mean the
one that makes the most difference to the classification of an example. That way, we hope to
get to the correct classification with a small number of tests, meaning that all paths in the tree
will be short and the tree as a whole will be shallow.
Figure 19.4(a) shows that Type is a poor attribute, because it leaves us with four
possibleoutcomes, each of which has the same number of positive as negative examples. On
the other hand, in (b) we see that Patrons is a fairly important attribute, because if the value is
None or Some, then we are left with example sets for which we can answer definitively (No
and Yes, respectively). If the value is Full, we are left with a mixed set of examples. There are
four cases to consider for these recursive subproblems:
1. If the remaining examples are all positive (or all negative), then we are done: we can
answer Yes or No. Figure 19.4(b) shows examples of this happening in the None and Some
branches.
2. If there are some positive and some negative examples, then choose the best attribute to
split them. Figure 19.4(b) shows Hungry being used to split the remaining examples.
3. If there are no examples left, it means that no example has been observed for this
combination of attribute values, and we return the most common output value from the set of
examples that were used in constructing the node’s parent.
4. If there are no attributes left, but both positive and negative examples, it means that these
examples have exactly the same description, but different classifications. This can happen
because there is an error or noise in the data; because the domain is nondeter- Noise ministic;
or because we can’t observe an attribute that would distinguish the examples. The best we
can do is return the most common output value of the remaining examples.

The learning algorithm has no reason to include tests for Raining and Reservation, because it
can classify all the examples without them. It has also detected an interesting and previously
unsuspected pattern: SR will wait for Thai food on weekends. It is also bound to make some
mistakes for cases where it has seen no examples.
We can evaluate the performance of a learning algorithm with a learning curve, as shown
Learning curve in Figure 19.7. For this figure we have 100 examples at our disposal, which
we split randomly into a training set and a test set. We learn a hypothesis h with the training
set and measure its accuracy with the test set. We can do this starting with a training set of
size 1 and increasing one at a time up to size 99. For each size, we actually repeat the process
of randomly splitting into training and test sets 20 times, and average the results of the 20
trials. The curve shows that as the training set size grows, the accuracy increases. (For this
reason, learning curves are also called happy graphs.) In this graph we reach 95% accuracy,
and it looks as if the Happy graphs curve might continue to increase if we had more data.

3. Accuracy Comparison: Logistic Regression vs. Decision Trees


Criterion Logistic Regression Decision Trees

Accuracy on Linearly Separable Data ✅ High ❌ Moderate

Accuracy on Non-Linear Data ❌ Low ✅ High

Interpretability ✅ High ✅ High

Scalability to Large Datasets ✅ High ❌ Moderate

Robustness to Noise ❌ Low ✅ High

Overfitting Risk ✅ Low ❌ High

Which Provides Better Accuracy?

 For linearly separable problems (e.g., binary signal detection, basic spectrum
sensing), Logistic Regression performs well.
 For complex, nonlinear classification tasks (e.g., modulation classification,
interference detection), Decision Trees outperform Logistic Regression due to
their ability to capture intricate relationships.
 Decision Trees are preferred when feature interactions are crucial and data has
a nonlinear structure.
 Logistic Regression is better suited for simpler problems where interpretability
and computational efficiency are critical.

You might also like