Final Thesis
Final Thesis
BS Thesis
by
Muhammad Waqar Zain
Arfa Farooq
CIIT/SP20-BSM-018/LHR
CIIT/SP20-BSM-028/LHR
Fall 2023
Machine Learning From Roots To Networks
In partial fulfillment
of the requirement for the degree of
Bachelor of Science
in
Mathematics
by
Muhammad Waqar Zain
Arfa Farooq
CIIT/SP20-BSM-018/LHR
CIIT/SP20-BSM-028/LHR
Department of Mathematics
Faculty of Science
Fall 2023
ii
Machine Learning From Roots To Networks
Supervisory Committee
Supervisor Member
Member Member
Name Dr.
Associate Professor Associate Professor
Department of Mathematiccs Department of Mathematiccs
COMSATS University Islamabad (CUI) COMSATS University Islamabad (CUI)
Lahore Campus Lahore Campus
iii
Certificate of Approval
Machine Learning From Roots To Networks
External Examiner:
Supervisor:
Head of Department:
iv
Declaration
Date:
v
Certificate
Date:
Supervisor
vi
Dedication
vii
Acknowledgements
First and foremost, we would like to thank ALLAH Almighty (the most beneficent and
most merciful) for giving me the strength, knowledge, ability and opportunity to undertake
this research study and to preserve and complete it satisfactorily. Without countless bless-
ing of ALLAH Almighty, this achievement would not have been possible. May His peace
and blessings be upon His messenger Hazrat Muhammad (PBUH), upon his family, com-
panions and whoever follows him. Our insightful gratitude to Hazrat Muhammad (PBUH)
Who is forever a track of guidance and knowledge for humanity as a whole. In our journey
towards this degree, we have found a teacher, an inspiration, a role model and a pillar of
support in our life.
M. Waqar Zain
Arfa Farooq
CIIT/SP20-BSM-018/LHR
CIIT/SP20-BSM-028/LHR
viii
Abstract
This work provides a foundational exploration into the realm of machine learning, covering
essential concepts and methodologies. The study begins with an introduction to machine
learning, delineating its significance in contemporary technology. It then delves into the
classification of machine learning into various types. The investigation includes an in-depth
analysis of prominent supervised learning algorithms, such as regression, decision trees,
support vector machines (SVM), and neural networks. Each algorithm is scrutinized for
its characteristics, applications, and underlying principles. By comprehensively addressing
these core components, this work aims to furnish a solid understanding of the fundamental
aspects of machine learning and its diverse applications.
ix
Table of Contents
x
3.3.1 Applications of Multilinear Regression . . . . . . . . . . . . . . 25
3.4 Decision Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.4.1 Components of Decision Trees . . . . . . . . . . . . . . . . . . 26
3.5 Implementation of Decision Trees . . . . . . . . . . . . . . . . . . . . . 27
3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.1 Definition: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.2 Applications of Neural Networks: . . . . . . . . . . . . . . . . . . . . . 37
5.3 Structure of neural networks: . . . . . . . . . . . . . . . . . . . . . . . . 38
5.3.1 Components of neural networks: . . . . . . . . . . . . . . . . . 38
5.4 Working of neural networks: . . . . . . . . . . . . . . . . . . . . . . . . 39
5.4.1 Forward propagation: . . . . . . . . . . . . . . . . . . . . . . . . 39
5.4.2 Backward propagation: . . . . . . . . . . . . . . . . . . . . . . . 40
5.5 Implementation of neural networks: . . . . . . . . . . . . . . . . . . . . 41
5.6 Learning in neural networks: . . . . . . . . . . . . . . . . . . . . . . . . 42
5.7 Types of neural networks: . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.8 Advantages of neural networks: . . . . . . . . . . . . . . . . . . . . . . 44
5.9 Disadvantages of neural networks: . . . . . . . . . . . . . . . . . . . . . 44
xi
5.10 Conclusion: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
6 Problem Solving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
6.1 Problem 01 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
6.2 Problem 02 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6.3 Problem 03 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.4 Problem 04 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
6.5 Problem 05 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
xii
List of Figures
xiii
Figure6.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Figure6.5 GridsearchCV Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Figure6.6 GridsearchCV Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Figure6.7 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Figure6.8 Pipeline Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Figure6.9 Pipeline Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Figure6.10 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Figure6.11 Single Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Figure6.12 Single Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Figure6.13 Single Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Figure6.14 Single Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Figure6.15 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Figure6.16 Model for GridsearchCV . . . . . . . . . . . . . . . . . . . . . . . . . 61
Figure6.17 Model for GridsearchCV . . . . . . . . . . . . . . . . . . . . . . . . . 62
Figure6.18 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
xiv
List of Tables
xv
Chapter 1
The Machine Learning Landscape
1.1 Machine learning:
Machine learning is the science and art of programming computers so they can learn from
data.
Here is a slightly more general definition:
Machine learning is the field of study that gives computers the ability to learn without being
explicitly programmed.
Mathematically, machine learning is defined as:
Let S be the training dataset, where
represents n training examples. Here, xi denotes the input or features of the i-th example,
and yi represents the corresponding output or target value. The goal of machine learning
is to find a function or model f that maps inputs to outputs, such that f (xi ) ≈ yi for all
(xi , yi ) ∈ S
Machine learning is about extracting knowledge from data. It is a research field at the
intersection of statistics and computer science. It is also known as predictive analytics or
statistical learning.
Example:
Spam emails whose job is to move the inappropriate incoming email messages to a spam
folder. We could make up a blacklist of words that would result in an email being marked
as spam.
1
Figure 1.1: Email Spamming
Training:
Training sets are the examples that the system uses to learn. Training sets are a fundamen-
tal concept in machine learning and artificial intelligence. It is a collection of data used to
train a machine learning model. They are like textbooks and exercise books for AI models.
When we want to teach a model to perform a task, we expose it to lots of examples. These
are the training sets. Each example consists of input data and the correct output, and the
model learns to make predictions based on this input-output relationship. It’s like a teacher
showing a student a bunch of math problems along with the correct answers, so the student
learns how to solve similar problems on their own.
Samples:
Each training example is called a training instance or sample. A sample is essentially an
individual data entry within the training set. It could be an image, a text document, a set of
features, or any other unit of data depending on the type of problem. During the training
process, the machine learning model learns patterns and relationships from these samples
2
in the training set. Once trained, the model can then make predictions or classifications on
new, unseen data. The quality and representativeness of the training set are crucial factors
in the performance of the trained model.
Model:
The part of a machine learning system that learns and makes predictions is called a model.
A model is essentially a mathematical representation or framework that captures patterns,
relationships, and structures within the data. The purpose of a machine learning model is to
make predictions or decisions without being explicitly programmed. A machine learning
model is a computational representation of patterns in data that has been trained to make
predictions or decisions. There are various types of a machine learning models, including
linear regression, decision trees, support vector machines, neural networks, and many oth-
ers. The choice of model depends on the nature of the data.
Example:
Neural networks and random forests.
Digging into large amounts of data to discover hidden patterns is called data mining. Data
mining is a crucial component of machine learning, and it involves the process of discover-
ing patterns, trends, and valuable information from large datasets. It is more about extract-
ing knowledge from data. Data mining is a complementary process to machine learning,
3
providing the foundational steps of exploring, preprocessing, and extracting knowledge
from data. The insights gained through data mining inform the selection of appropriate
machine learning models and features, contributing to the overall success of the predictive
or decision-making tasks.
4
Chapter 2
Machine Learning And Its Types
There are so many different types of machine learning systems that it is useful to classify
them in broad categories, based on the following criteria:
• How they are supervised during training. It includes supervised, unsupervised, semi-
supervised, self-supervised, and others.
• Whether or not they can learn incrementally on the fly. It includes online and batch learn-
ing.
• Whether they work by simply comparing new data points to known data points, or in-
stead by detecting patterns in the training data and building a predictive model. It includes
instance-based and model-based learning.
Supervised learning is the type of machine learning in which machines are trained using
well labelled training data, and based on that data, machines predict the output. In a super-
vised learning, the algorithm is provided with a training set, which consists of input-output
pairs. The algorithm learns from this training data by adjusting its parameters or weights
based on the input-output relationships. The training process involves iteratively refining
the model until it can generalize well to new, unseen data. There are two main types of
supervised machine learning:
5
Figure 2.1: Types of Supervised Learning
Regression:
In regression, the algorithm learns to predict a continuous output variable. For example,
predicting house prices based on features like square footage, number of bedrooms, and
location.
Classification:
In classification tasks, the algorithm learns to assign input data to discrete classes or cat-
egories. Examples include spam detection in emails, image classification, or predicting
whether a customer will return or not. Supervised learning is widely used in various field-
s, such as finance, health care, natural language processing, computer vision, and many
others, due to its ability to make predictions or classifications based on labeled data. In
supervised learning, the training set we feed to the algorithm includes the desired solution-
s, called labels. The labelled data means some input data is already tagged with correct
output.
6
Figure 2.2: Linear Regression Code In Python
This linear regression example predict the salaries of teachers on the base of their experi-
ence
7
Figure 2.3: Real life example of Linear regression in python
8
2.2 Unsupervised Machine Learning
In unsupervised learning, the training data is unlabeled. The system tries to learn without
a teacher. Unsupervised learning is a type of machine learning where the algorithm is
given input data without explicit output labels. The goal of unsupervised learning is to
explore the inherent structure or patterns within the data. Unlike supervised learning, there
is no provided ”teacher” or labeled target for the algorithm to learn from. Instead, the
algorithm tries to find hidden patterns or relationships on its own. There are two main
types of unsupervised learning tasks:
Example:
Consider a dataset containing information about teachers’ salaries, including features such
as teacher ID, age, years of experience, and income. In unsupervised learning, the goal is
to explore the data, uncover patterns, and extract meaningful insights without relying on
labeled information or predefined objectives.
Clustering:
In clustering, the algorithm groups similar data points together based on certain features or
9
characteristics, forming clusters. The goal is to identify natural groupings in the data with-
out any prior knowledge of the categories. Common clustering algorithms include k-means
clustering and hierarchical clustering.
Dimensionality Reduction:
In dimensionality reduction techniques, the aim is to reduce the number of features or vari-
ables in the data while preserving its important information. It is useful for simplifying
complex datasets and extracting the most relevant features. Principal Component Anal-
ysis (PCA) and t-Distributed Stochastic Neighbor Embedding (t-SNE) are examples of
dimensionality reduction methods. Unsupervised learning is often used in scenarios where
labeled data is scarce or expensive to obtain. It helps to discover patterns, associations,
and structures in the data that may not be apparent initially. Some applications of unsuper-
vised learning include customer segmentation, anomaly detection, and feature extraction
for subsequent supervised learning tasks. It is important to note that the distinction be-
tween supervised and unsupervised learning is not always strict, and some methods, like
semi-supervised learning, combine elements of both paradigms.
Example: Visualization algorithm.
10
algorithm to explore the data more broadly and potentially discover hidden structures.
There are various techniques used in semi-supervised learning, including:
Self-training:
The model is initially trained on the labeled data, and then it makes predictions on the un-
labeled data. The confident predictions on the unlabeled data are then added to the labeled
dataset, and the model is retrained.
Co-training:
The algorithm is trained on multiple views or representations of the data. Each view might
have a different set of features, and the model is trained on one set of features at a time.
The agreement between predictions on the labeled data helps in improving performance.
Multi-view learning:
Like co-training, but the different views of the data are considered independently, and mod-
els are trained on each view separately. Semi-supervised learning is applied in scenarios
where acquiring labeled data is challenging, such as in medical imaging, where labeling re-
quires expert annotation, or in natural language processing, where labeling large amounts
of text data can be time-consuming. It aims to take advantage of both labeled and unlabeled
data to build more robust and accurate models.
Another approach to machine learning involves generating a fully labeled data set from a
fully unlabeled one. Once the whole data set is labeled, any supervised learning algorithm
can be used. This approach is called self-supervised learning. Self-supervised learning is a
machine learning paradigm where the algorithm learns from the data itself without explicit
external labels. In self-supervised learning, the algorithm generates its own supervisory
signal or labels from the input data, creating a surrogate or proxy task that the model aims
to solve. The goal is to leverage the inherent structure or relationships within the data to en-
able learning without the need for human-annotated labels. Key features of self-supervised
learning include:
11
Proxy Tasks:
Instead of relying on external labels, self-supervised learning involves defining proxy tasks
that are derived from the input data. These tasks are designed to be solvable by the model
using the inherent information present in the data.
Data Augmentation:
Self-supervised learning often involves creating variations or augmentations of the input
data. The model is then trained to predict or generate the original data from its augmented
versions. Self-supervised learning has gained popularity due to its ability to learn useful
representations from large amounts of unlabeled data. It is widely used in computer vi-
sion, natural language processing, and other domains where obtaining labeled data can be
expensive or impractical. The learned representations can then be fine-tuned on specific
downstream tasks using smaller amounts of labeled data.
Reinforcement learning is a very different beast. The learning system, called an agent, can
observe the environment, select and perform actions, and get rewards in return or penalties
in the form of negative rewards. It must then learn by itself what is the best strategy,
called a policy, to get the most reward over time. A policy defines what action the agent
should choose when it is in each situation. Reinforcement Learning (RL) is a type of
machine learning paradigm where an agent learns to make decisions by interacting with an
environment. The agent receives feedback in the form of rewards or penalties based on the
actions it takes, and its goal is to learn a strategy or policy that maximizes the cumulative
reward over time. Key components of reinforcement learning include:
12
Figure 2.5: Reinforcement Learning
Agent:
The entity that makes decisions and takes actions in the environment. The objective of the
agent is to learn the optimal strategy to maximize cumulative rewards.
Environment:
It is the external system or context with which the agent interacts. The environment defines
the state of the system and provides feedback to the agent in the form of rewards or penal-
ties.
State:
A state is the representation of the current situation or configuration of the environment.
The state contains all the relevant information which is needed to make decisions.
Action:
The action is the set of possible moves or decisions that the agent can take in each state.
These actions influence the subsequent state of the environment.
Reward:
A reward is the numerical value that the agent receives as feedback from the environment
after taking a particular action in a specific state. The goal of the agent is to maximize the
cumulative reward over time.
Policy:
The policy is the strategy or set of rules that the agent uses to determine its actions in dif-
ferent states. The objective is to learn an optimal policy that leads to the highest cumulative
reward. Reinforcement learning involves the agent interacting with the environment over
13
multiple time steps, learning from its experiences, and adjusting its policy to improve per-
formance. Popular reinforcement learning algorithms include Q-learning, Deep Q Network
(DQN), Policy Gradient methods, and more recently, algorithms based on deep neural net-
works, such as Proximal Policy Optimization (PPO) and Trust Region Policy Optimization
(TRPO).
Reinforcement learning has applications in various domains, including robotics, game play-
ing, autonomous systems, finance, and more. It has been notably successful in training
agents to play complex games like Go and Poker, as well as in optimizing control strategies
for robotics and autonomous vehicles.
Batch learning, also known as offline learning or batch training, is a machine learning
paradigm where a model is trained on a complete dataset at once. In batch learning, the
entire dataset, including both input features and corresponding labels, is used to update the
parameters of the model in a single iteration. In batch learning, the system is incapable
of learning incrementally, that is, it must be trained using all the available data. This will
generally take a lot of time and computing resources, so it is typically done offline. First
the system is trained, and then it is launched into production and runs without learning any-
more. It just applies what it has learned. This is called offline learning. Key characteristics
of batch learning include:
Training on the Entire Dataset:
In batch learning, the model sees the entire dataset during each iteration of the training
process. The model computes the gradients and updates its parameters using the average
of the gradients calculated over the entire dataset.
Offline Processing:
Batch learning is often used in offline or batch processing scenarios, where the entire
dataset is available before training begins. This is common in scenarios where data can
be collected and processed in bulk, rather than in a streaming or real-time fashion.
Computationally Intensive:
14
Training on the entire data set at once can be computationally intensive, especially when
dealing with large datasets. However, it may also be more efficient in terms of hardware
utilization, as the processing can be optimized for matrix operations.
Iterative Optimization:
Batch learning typically involves multiple iterations over the entire dataset. The parame-
ters of the model are updated iteratively to minimize a predefined loss function until con-
vergence. Batch learning is suitable for scenarios where the dataset fits into memory, and
computational resources are sufficient to process the entire dataset in one go. It is com-
monly used in tasks such as model training for offline analytics, where the focus is on
optimizing the model based on the available historical data. While batch learning has its
advantages, it may not be well-suited for applications where data is constantly streaming
in, as it requires periodic retraining with updated datasets. In contrast, online learning is
more suitable for scenarios where the model needs to adapt to changing data over time.
Online learning, also known as incremental learning, streaming learning, or online machine
learning, is a machine learning paradigm where a model is updated continuously as new
data becomes available. In online learning, the model processes data one observation at
a time, updating its parameters incrementally. In online learning, we train the system in-
crementally by feeding it with data instances sequentially, either individually or in small
groups called mini batches. Each learning step is fast and cheap, so the system can learn
about new data on the fly, as it arrives. Key characteristics of online learning include:
Sequential Processing:
Data is processed in a sequential manner, with the model learning from one observation at
a time. This allows the model to adapt to new patterns or changes in the data as they occur.
Continuous Update:
The model is updated incrementally with each new data point. The updates can be per-
formed iteratively, with the model by adjusting its parameters to minimize a predefined
loss function based on the most recent data.
15
Adaptability to Changing Data:
Online learning is particularly useful in scenarios where the underlying patterns in the data
may change over time. The model can continuously adapt to new trends and patterns with-
out the need to retrain the entire dataset.
Real-time Processing:
Online learning is well-suited for applications that require real-time decision-making or
analysis, as the model can be updated with each new piece of data as it arrives.
Online learning algorithms include:
Stochastic Gradient Descent (SGD):
It is a popular optimization algorithm used in online learning. It updates the model param-
eters based on the gradient of the loss function computed for each individual data point.
Online Passive-Aggressive Algorithms:
These algorithms are designed for online learning and are particularly useful in classifica-
tion tasks.
Perceptron Algorithm:
A simple online learning algorithm used for binary classification. Online learning is applied
in various domains, including fraud detection, recommendation systems, anomaly detec-
tion, and other applications where data is continuously generated, and the model needs to
adapt to changes in the underlying patterns. It is particularly beneficial in situations where
the data is too large to fit into memory for batch processing or when the model needs to
respond quickly to new information.
In instance-based learning, the system learns the examples by heart, then generalizes to new
cases by using a similarity measure to compare them to the learned examples. Instance-
based learning, also known as memory-based learning or lazy learning, is a type of machine
learning where the model is trained on the entire dataset and makes predictions for new in-
stances based on the similarity to instances in the training data. Instead of learning a general
model during training, instance-based learning stores the training examples and uses them
16
directly for prediction. Key characteristics of instance-based learning include:
Memory-Intensive:
Instance-based learning methods store the entire training dataset in memory. During pre-
diction, the algorithm identifies the most similar instances in the training set to the new
input and uses their information to make predictions.
No Explicit Model:
Instance-based learning does not build an explicit model during training. It relies on the
stored instances for making predictions.
Similarity Measure:
The choice of similarity measure is crucial in instance-based learning. Common similarity
measures include Euclidean distance, cosine similarity, or other metrics that quantify the
similarity between instances.
17
2.10 Model-based learning
Another way to generalize from a set of examples is to build a model of these examples and
then use that model to make predictions. This is called model-based learning. Model-based
learning is a machine learning paradigm in which a model is trained to make predictions or
decisions based on a given dataset. In contrast to instance-based learning, where the entire
dataset is stored for later use, model-based learning involves constructing a general model
during the training phase. This model is then used to make predictions on new, unseen data.
Key characteristics of model-based learning include:
Generalization:
Model-based learning aims to learn general patterns or relationships in the data that can be
applied to make predictions on new instances not seen during training. The goal is to create
a model that generalizes well to unseen data.
Parameterization:
The model is typically defined by a set of parameters that are learned from the training
data. The learning process involves adjusting these parameters to minimize a predefined
loss function, representing the difference between the model’s predictions and the actual
outcomes in the training data.
Explicit Representation:
The trained model provides an explicit representation of the underlying patterns in the data.
This representation can be used to make predictions or gain insights into the relationships
between input features and output predictions.
Computational Efficiency:
Once the model is trained, making predictions on new instances is usually computationally
efficient, as the model encapsulates the learned patterns without needing to store the entire
training dataset.
Common types of model based learnings.
Linear Models:
Linear regression and logistic regression are examples of model-based learning algorithms
that use linear relationships between input features and output predictions.
18
Decision Trees:
Decision tree-based models, such as Random Forests or Gradient Boosted Trees, are con-
structed during training to capture non-linear relationships and complex decision bound-
aries.
Neural Networks:
Deep learning models, such as artificial neural networks, use layered architectures to learn
hierarchical representations of data.
Support Vector Machines (SVM):
SVM is a model-based learning algorithm that finds the optimal hyperplane to separate
different classes in the feature space. Model-based learning is widely used in various ap-
plications, and the choice of a specific model depends on the characteristics of the data and
the nature of the problem being addressed.
2.11 Conclusion
In conclusion, machine learning can be broadly categorized into three main types: su-
pervised learning, unsupervised learning, and reinforcement learning. Supervised learn-
ing involves training a model on labeled data to make predictions or classifications. Un-
supervised learning explores patterns and relationships in unlabeled data, often through
clustering or dimensionality reduction. Reinforcement learning, inspired by behavioral
psychology, focuses on an agent learning to make decisions through trial and error, with
rewards or penalties shaping its behavior. Each type serves distinct purposes and applica-
tions, collectively contributing to the diverse and powerful landscape of machine learning.
As technology advances, the integration of these approaches continues to drive innova-
tion across various fields, promising a future where machines can adapt, reason, and learn
autonomously.
19
Chapter 3
Algorithms In Machine Learning
Machine Learning (ML) algorithms are the core components of machine learning system-
s. Machine Learning involves the use of algorithms to enable computers to learn patterns,
make decisions, and improve their performance on a task over time without being explicitly
programmed. There are various algorithms in machine learning, and they can be broadly
categorized into different types based on the nature of the task they are designed to solve.
Here are some common types of machine learning algorithms:
• Regression
• Support Vector Machines (SVM)
• Decision Trees
• Neural Networks
These are just a few examples, and there are many other algorithms and variations within
each category. The choice of algorithm depends on the specific task, the nature of the data,
and the goals of the machine learning project. Additionally, the field of machine learning
is dynamic, with ongoing research leading to the development of new algorithms and im-
provements to existing ones. Here, well discuss a few of them.
3.1 Regression
Regression means to predict, forecast and to assess the relationship between a dependent
and independent variable. More generally:
A statistical technique that relates a dependent variable to one or more independent vari-
ables. The goal of regression analysis is to understand the nature of the relationship between
variables and make predictions based on that understanding. A regression model is able to
show whether changes observed in the dependent variable are associated with changes in
one or more of the explanatory variables. Regression algorithms are used if there is a re-
lationship between the input and output variable. In deals in numerical values. There are
20
different types of regression analysis, and two major categories are simple regression and
multiple regression.
If a single independent variable is used to predict the value of a numerical dependent vari-
able, then such a linear regression algorithm is called simple linear regression. Simple
regression involves modeling the relationship between a dependent variable and a single
independent variable. The key point in linear regression is that the dependent variable must
be a continuous real value. The most common form is linear regression, where the relation-
ship is assumed to be a straight line. The equation for simple linear regression is:
y = mx + b (3.1.1)
where, y is the dependent variable x is the independent variable m is the slope, and b is the
intercept.
Mathematically, it can be expressed as:
Y = b0 + b1 x1 + b2 x2 + . . . + bn xn + ε (3.1.2)
Linear regression is a fundamental and widely used algorithm in machine learning, partic-
ularly in the field of supervised learning. It is employed for tasks that involve predicting a
21
continuous outcome based on one or more input features. Here’s a brief overview of how
linear regression is used in machine learning:
Problem Formulation:
Linear regression is suitable for problems where the relationship between the input and the
output is assumed to be linear. The goal is to find the best-fitting straight line that mini-
mizes the difference between the predicted and actual values.
Training the Model:
The training process involves finding the values for the model parameters that minimize
the difference between the predicted and actual output values.
Evaluation:
Once the model is trained, it needs to be evaluated on new unseen data to assess its per-
formance. Common metrics for regression problems include mean squared error (MSE),
mean absolute error (MAE), and R-squared.
Prediction:
After successful training and evaluation, the model can be used to make predictions on new
data by inputting the features into the trained model.
Applications:
Linear regression is used in various fields for tasks such as predicting house prices, sales
forecasts, stock prices, and many other scenarios where a linear relationship is assumed
between input features and the target variable.
Assumptions:
Linear regression assumes that the relationship between variables is linear, the residuals are
normally distributed, and the variance of residuals is constant. Linear regression serves as a
foundational building block for more complex models, and its simplicity and interpretabil-
ity make it a valuable tool in many machine learning applications.
22
regression:
Model Representation:
The multiple linear regression model can be represented as:
Y = b0 + b1 x1 + b2 x2 + . . . + bn xn + ε (3.3.1)
where, y is the target variable, are the coefficients for each feature, and ε represents the
error term.
Training the Model:
The training process involves finding the values for the coefficients that minimize the dif-
ference between the predicted and actual output values.
Matrix Notation:
The multiple linear regression equation can be written in matrix notation as:
Y = Xβ + ε (3.3.2)
23
Figure 3.2: Multilinear Regression Code in Python
24
Assumptions:
Like simple linear regression, multiple linear regression assumes that the relationship be-
tween variables is linear, there is little or no multicollinearity, the residuals are normally
distributed, and the variance of residuals is constant.
Interpretability:
Coefficients represent the change in the target variable for a one-unit change in the corre-
sponding input feature, assuming other features are held constant.
Feature Scaling:
It is often recommended to scale or normalize input to ensure that all features contribute
equally to the model and to facilitate convergence during the training process.
Multiple linear regression is applied in various domains for tasks such as predicting sales
based on advertising spending in multiple channels, predicting a person’s income based on
education, experience, and other factors, and many other scenarios where multiple features
contribute to the target variable. Multiple linear regression is a powerful tool for modeling
complex relationships between multiple variables and is commonly used in data analysis
and predictive modeling when dealing with datasets that have multiple influencing factors.
A decision tree is a popular machine learning algorithm used for both classification and
regression tasks. It works by recursively partitioning the data into subsets based on the
values of input features, creating a tree-like structure of decisions.
25
Figure 3.3: Decision tree representation
Root Node:
The topmost node in the tree, which represents the entire dataset. It is split into two or more
child nodes based on the most significant feature.
Decision Nodes:
Nodes represent a decision based on the value of a particular feature. Each decision node
leads to further nodes or leaves.
Leaves:
Nodes that do not split further and represent the final output or decision. In a classification
tree, each leaf corresponds to a class label, while in a regression tree, it represents a numeric
value.
Splitting:
The process of dividing a node into two or more child nodes based on a chosen feature
and a splitting criterion (e.g., Gini impurity for classification or mean squared error for
regression).
Decision Criteria:
The criteria for planning at each node are determined by the algorithm during the training
process. For example, in a classification tree, it could be based on which feature and
26
threshold lead to the best separation of classes.
Recursive Process:
The process of splitting is applied recursively to each child node until a stopping criterion is
met. This could be a predefined depth of the tree, a minimum number of samples in a node,
or other criteria. Decision trees have several advantages, including interpretability, ease
of understanding, and the ability to handle both numerical and categorical data. However,
they can be prone to overfitting, especially when the tree is deep and captures noise in the
training data. Techniques like pruning and using ensemble methods (e.g., Random Forests)
can help mitigate this issue.
27
3.6 Conclusion
28
Chapter 4
Support Vector Machine
4.1 Definition:
Support vector machine is an algorithm of supervised machine learning used to solve classi-
fication and regression problems. But mainly it is used in regression tasks, its main purpose
is to draw a boundary line between two sets of data. So, it is easy to classify that which
object belongs to which data set.
Figure 4.1 shows two data set which are sets of triangle and sets of squares and between
them is a boundary line which is separating both classes of data.
The initial aim of support vector machine is to create a boundary line or decision boundary
between two classes of data. Its other purpose is to form a hyper plane and to maximize
the size of margins from boundary line to the nearest data point.
• Hyperplanes:
It is a decision boundary or a subspace of a feature space which helps to separate the data
points into different classes. There are numerous number of decision boundaries in feature
29
space but the best decision boundary which classify data points is known as hyperplane.
Hyperplanes always formed where the margin is maximum between data points.
• N Dimensional Hyperplanes:
Usually we see support vector machine in 2-D plane, while hyperplane exist in N-Dimensional
spaces, where N is the number of features or dimensions in dataset. In 2-D hyperplane, it
is a line while in 3-D hyperplane it is a space.
• It is also represent in linear equation form
wT x + b = 0 (4.3.1)
Where w represents vector, x represents data points and b is bias term or intercepts which
shift hyperplane away from origin.
• Optimal Hyperplane:
In SVM, it is the one that has maximum margin while satisfying constraints that all data
points are correctly classified.
• Support Vectors:
These are data points closed to hyperplanes, which are critical in determining the hyper-
plane. They used to determine the position of the hyperplane and defining margins. These
points have some impact on the calculation of margins.
30
Figure 4.3: Support Vectors
• Margins:
It is the distance between the hyperplane and the nearest data points from either class each
side of the decision boundary. Support vector machine maximize the margin because a
longer margin leads to better generalization and lower risks the over fitting. As, shown in
figure below:
31
Figure 4.4: Margins
• Kernel Function:
Kernel function is useful for both linear separable data and nonlinear separable data. Its
basic function is to convert non separable data from lower dimension to higher dimension
32
space and make it easily separable by a linear hyperplane.
Figure 4.6: 1D TO 2D
Figure 4.7: 2D TO 3D
Figures illustrates how kernel works to separate inseparable data by increasing the di-
mension. Similarly, figure 4.7 illustrates conversion of inseparable data in 2-D plane to 3-D
space.
Mainly SVM has two types and their significance is mentioned below:
• Linear Support Vector Machine:
In linear support vector machine, a linear straight line can distribute the data into separate
33
classes, when the data in already separated linearly then linear SVM is suitable, which
shows in 2-d plane that a single straight line differentiates the classes. The hyperplane
which maximize the margin between classes is called decision boundary.
• Non Linear Support Vector Machine:
In nonlinear support vector machine, the data is not separated rather it is in clustered form,
which is nearly impossible to separate the data with linear straight line as all the data sets
are mixed together. So, in this case, we use kernel function which separates the data by
converting it into higher dimension from lower dimension as described in the figure 4.7.
Lets try to understand the working of SVM with an example. If we see a strange cat with
some of the features resembling with the dog and we want to know whether it is a cat or a
dog. In this case, we use the algorithms of support vector machine to know accurately what
it is. For predicting, we will train our model with an input having a lot of images of cats
and dogs through which the model will learn by analyzing the features of both animals and
test it with the object we want to know and then SVM creates a decision boundary between
two classes and analyze it thoroughly with data and try to resemble with features of both.
After that, with the help of support vectors, it will predict that it is a cat not a dog. SVM is
also used in various areas like detecting faces, recognizing images and categorizingtext.
34
tulipx = x[50 :]
tulipy = y[50 :]
35
ble to both regression and classification problems and have numerous functions to tackle
complex tasks.
• Handling of nonlinear data:
Support vector machine can handle nonlinear data easily by the use of kernel trick by
changing the dimensions of data.
• Effectuality of kernel function:
Kernel function is effective when we are dealing with nonlinear data which is not easily
separable. It helps in converting the data from lower dimensions to higher dimensions and
make it linear.
• Scalability:
It is always difficult to train the model when the samples are in millions. This always seems
impractical due to memory shortage and some computational restrictions.
• Computational restrictions:
When dealing with large set of data computationally, it cost a lot and learning takes a lot of
time and memory requirement increases gradually.
• Lack of accuracy:
When working with large data sets support vector machine does not show accurate or good
results.
4.9 Conclusion:
In conclusion, we say that support vector machines are very versatile machine learning
algorithm widely use to solve regression and classification problems. SVMs main purpose
is to draw optimal hyperplane which maximizes the gap within data points and separate data
into accurate classes. Its versatility is dealing with nonlinear data by using the kernel tricks
to change the dimensions. But despite of all, SVM lacks in giving accurate results when
dealing with large datasets. But, again it is the most powerful tool in machine learning.
36
Chapter 5
Neural Networks
5.1 Definition:
Neural network is a model of machine learning which is based on the mixture of artificial
intelligence and human brain. This model is inspired by the working of human brain for
explicating information. They are the complex networks of interconnected neurons or n-
odes to solve difficult problems by learning from training data, then testing it and giving
output.
Nowadays phenomenon neural networks are using behind many complex algorithms like
• Detecting faces and understanding spoken language in different applications like voice
assistance.
• Language recognition.
• Handwriting recognition by analyzing how you write different alphabets and numbers.
• Predicting stock rates, fraud prediction.
• In medical, diagnosing diseases and viewing different reports and images like X-rays,
37
Ct-scans etc.
As, we have discussed earlier that neural networks are the complex networks of intercon-
nected neurons or nodes. It has input and output layers. The information passes between
different nodes through interconnected links. These links have different weights and hid-
den layers between them. As, shown in figure 5.2
• Neuron/nodes
• Input layer
• Output layer
• Weights
• Hidden layers
• Loss calculations
• Adjusting weights
• Activation function
• Training
• Gradient descent
38
Figure 5.2: Components
Lets take a real world example of email detection that how neural network works. It is
discussed above that we have input and output layers in our neural network. All emails first
go through the input layer. This layer checks the senders information, subject and content.
Then by training all the data, it will decide whether the email is spam or not. Now, with
the help of binary activation function, output layer will detect its legitimate or not. It is a
difficult network based on few features our brain have. It has three types of layers; input
layer, output layer and hidden layers. Mainly this process comprises of two stages:
Forward propagation and backwardpropagation.
In forward propagation, it proceed input data in forward direction through hidden lay-
ers and these hidden layers fed data to successive layers. It has following components:
• Neurons/nodes:
These are data processing units which passes data through each other.
• Input layer:
It consist of different nodes which receives data and passes it to hidden layers.
39
• Output layer:
It gives us the output after processing all the input data through hidden layers.
• Weights:
It tells us about the strength or how strong is the connection between two nodes. The weight
assigned to each layer or connection is not constant. It varies throughout the process.
• Hidden layers:
These are most the important component of neural networks. They help neural networks
to learn from complex input data and give accurate results. Its quantity depends upon how
complex the problem is.
In backward propagation, we move backward from output node towards input node by test-
ing errors. It is very helpful in increasing the accuracy of our predictions. Its components
are given below:
• Loss calculations:
It tells us the difference between targeted output and the predicted output. In regression
problems, we call it as mean squared error. Mathematically, it is defined as
1 n
MSE = ∑ (yi − ybi )2 (5.4.1)
n i=1
• Adjusting weights:
We adjust the weight of every connection between two nodes by using backward propaga-
tion across our neural network.
• Training:
We have a large samples of data during training. Throughout the process forward propaga-
tion, backward propagation and loss calculation occurs again and again so that our network
learns the data pattern.
• Activation Function:
Activation function performs nonlinear transformation to our input so that our input learns
40
and solve complex problems. Otherwise it will be a simple linear regression problem.
• Gradient Descent:
This function helps in reducing the inaccuracy and loss of our output by changing weights
and weights are changed by taking derivative of loss w.r.t each weight.
Import numpy as np
# Updated values for X and Y
X = np.array([[0.5, 1.0, 1.5],[1.5, 2.0, 0.5],[1.0, 2.5, 1.5]])
Y = np.array([[0.2, 0.1, 0.3]])
Y = Y.T
sigm = 2
delt = np.random.random((3, 3)) - 1
# Training Loop
for j in range(100):
m1 = (y - (1 / (1 + np.exp(-(np.dot(1 / (1 + np.exp(-(np.dot(X, sigm)))), delt)))))) * ( (1 / (1
+ np.exp(-(np.dot(1 / (1 + np.exp(-(np.dot(X, sigm)))), delt)))))) * ( 1 - (1 / (1 + np.exp(-
(np.dot(1 / (1 + np.exp(-(np.dot(X, sigm)))), delt))))))
41
Output:
0.99753435 0.99754312 0.99757271
0.9996661 0.99966809 0.99967704
0.99995482 0.99995515 0.99995639
• Supervised learning
• Unsupervised learning
• Reinforcement learning
42
5.7 Types of neural networks:
43
5.8 Advantages of neural networks:
• Multiple processing:
Neural networks have the ability to perform multiple tasks which means it can solve more
than one problem at a time.
• Fault tolerence:
Neural networks have ability to tolerate fault. If one or more node is faulty, it does not
affect the working of whole network model.
• Non-linearity:
It converts our model into non-linear which helps data to learn and solve more complex
problems easily.
• Processing of unorganized data:
Neural networks have ability to sort and categorize large amount of data by processing it.
• Pattern recognition:
Neural networks are very effective in image recognition, natural language processing and
analyzing many other data patterns.
• Versatility:
Neural networks are very fast and efficient in learning new data. They are useful where
links between inputs and output are not defined well.
44
Without black box model in neural networks, how prediction are being made and how data
is categorized becomes difficult to understand.
• Hardware dependent:
Neural networks required good processors for making the model reliable and to make per-
formance of network smooth, which shows that they are highly dependent upon hardware.
5.10 Conclusion:
In conclusion, neural networks are one of the powerful tools in machine learning which is
inspired by the ability of human brain to learn and adopt. It is very successful in many areas
from recognizing images to natural language processing which shows how vast it become
in less time. It has very important role in transforming artificial intelligence in future by
having an ability to solve complex problems across various fields.
45
Chapter 6
Problem Solving
These exercise problems are taken from the book ”Hands-on Machine Learning with Scikit-
Learn, Keras, and TensorFlow by Aurelien Geron.” Chapter 02.
6.1 Problem 01
Try a Support Vector Machine regressor (sklearn.svm.SVR), with various hyper- param-
eters such as kernel=”linear” (with various values for the C hyperpara- meter) or ker-
nel=”rbf” (with various values for the C and gamma hyperparameters). Dont worry about
what these hyperparameters mean for now. How does the best SVR predictor perform?
Solution:
46
Figure 6.1: SVR Model
47
Figure 6.2: SVR Model
48
Figure 6.3: SVR Model
49
Figure 6.4: Results
6.2 Problem 02
50
Figure 6.5: GridsearchCV Model
51
Figure 6.6: GridsearchCV Model
52
Figure 6.7: Results
6.3 Problem 03
Try adding a transformer in the preparation pipeline to select only the most important at-
tributes.
53
Solution:
54
Figure 6.9: Pipeline Model
55
Figure 6.10: Results
56
6.4 Problem 04
Try creating a single pipeline that does the full data preparation plus the final prediction.
Solution:
57
Figure 6.12: Single Pipeline
58
Figure 6.13: Single Pipeline
59
Figure 6.14: Single Pipeline
60
6.5 Problem 05
61
Figure 6.17: Model for GridsearchCV
62
Figure 6.18: Results
63
References
[1] Neural networks and deep learning by Charu.C.Agarwal
[2] Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow by Aurelien
Geron
[3] https://www.techtarget.com/searchenterpriseai/definition/neural-network
[4] https://www.analyticsvidhya.com/blog/2022/01/introduction-to-neural-
networks/#h-how-does-a-neural-network-work
[5] https://www.geeksforgeeks.org/neural-networks-a-beginners-guide/
[6] https://www.baeldung.com/cs/hidden-layers-neural-network
[7] https://towardsdatascience.com/forward-propagation-in-neural-networks-
simplified-math-and-code-version-bbcfef6f9250
[8] https://www.sciencedirect.com/topics/computer-science/artificial-neural-network
[9] https://www.techtarget.com/searchenterpriseai/definition/backpropagation-
algorithm
[10] https://towardsdatascience.com/loss-functions-and-their-use-in-neural-networks-
a470e703f1e9
64
[11] https://www.javatpoint.com/unsupervised-artificial-neural-networks
[12] https://towardsdatascience.com/multilayer-perceptron-explained-with-a-real-life-
example-and-python-code-sentiment-analysis-cb408ee93141
[13] https://towardsdatascience.com/multilayer-perceptron-explained-with-a-real-life-
example-and-python-code-sentiment-analysis-cb408ee93141
[14] https://www.geeksforgeeks.org/support-vector-machine-algorithm
[15] https://uedufy.com/calculate-multiple-linear-regression-using-spss
[16] https://www.javatpoint.com/machine-learning-support-vector-machine-algorithm
65