UNIT-1 DLL
UNIT-1 DLL
Introduction to ML
programmed. So instead of you writing the code, what you do is you feed
data to the algorithm, and the algorithm/ machine builds the logic based
information.
Now the first step is to train the machine with all different fruits
one by one like this:
If shape of object is rounded and depression at top having color
Red then it will be labelled as –Apple.
If shape of object is long curving cylinder having color Green-
Yellow then it will be labelled as –Banana.
Now suppose after training the data, you have given a new separate
fruit say Banana from basket and asked to identify it.
Since machine has already learned the things from previous data
and this time have to use it wisely. It will first classify the fruit
with its shape and color, and would confirm the fruit name as
BANANA and put it in Banana category.
Thus machine learns the things from training data(basket
containing fruits) and then apply the knowledge to test data(new
fruit).
Supervised learning classified into two categories of algorithms:
Classification: A classification problem is when the output variable
is a category, such as ―Red or blue or disease and no disease.
Regression: A regression problem is when the output variable is a
real value, such as price or weight or height.
Split the training dataset into training dataset, test dataset, and
validation dataset.
Determine the input features of the training dataset, which should
have enough knowledge so that the model can accurately predict the
output.
training datasets.
Evaluate the accuracy of the model by providing the test set. If the
accurate.
classes of objects.
complex tasks.
classes of object.
Unsupervised learning
points.
learning technique that checks for the dependency of one data item
diagram.
Firstly, it will interpret the raw data to find the hidden patterns
from the data and then will apply suitable algorithms such as
K-means clustering, Decision tree, etc..
Reinforcement Learning
yield the most reward by trying them. In the most interesting and
challenging cases, actions may affect not only the immediate reward
but also the next situations and, through that, all subsequent rewards.
thing. It has to find out what it did that made it get the
and some form of robots, e.g., drones, warehouse robots, and more
trial and error. The agent takes actions within the environment,
Unsupervised Reinforcement
Aspect Supervised Learning
Learning Learning
environment and
receiving rewards
or penalties.
No predefined
dataset; learns
Works with
Requires a dataset with from interactions
Data unlabelled data.
input-output pairs. Data with the
Requirement No need for input-
must be labelled. environment
output pairs.
through trial and
error.
Model that
Policy or strategy
identifies the
that specifies the
A predictive model that data's patterns,
Output action to take in
maps inputs to outputs. clusters,
each state of the
associations, or
environment.
features.
structures. necessarily
immediate).
Discover the
Minimize the error between Maximize
underlying
Goal predicted and actual cumulative
structure of the
outputs. reward over time.
data.
etc. environment.
Interpretation is can be
environments
with sparse
rewards.
datasets.
Neural Networks
neural networks also have neurons that are interconnected to one another
Network.
The typical Artificial Neural Network looks something like the given
figure.
network:
There are around 1000 billion neurons in the human brain. Each
100,000.
distributed, and we can extract more than one piece of this data when
necessary from our memory parallel. We can say that the human brain
gives an output. "OR" gate, which takes two inputs. If one or both the
inputs are "On," then we get "On" in output. If both the inputs are
"Off," then we get "Off" in output. Here the output depends upon
input.
Our brain does not perform the same task. The outputs to inputs
are "learning."
Input Layer:
Hidden Layer:
The hidden layer presents in-between input and output layers. It performs
Output Layer:
The input goes through a series of transformations using the hidden layer,
The artificial neural network takes input and computes the weighted sum
form
of a transfer function.
It determines weighted total is passed as an input to an activation function
fire or not. Only those who are fired make it to the output layer. There are
distinctive activation functions available that can be applied upon the sort
Artificial neural networks have a numerical value that can perform more
After ANN training, the information may produce output even with
can't appear to the network in all its aspects, it can produce false output.
Learning Algorithms:
The steps for the Support Vector Machine (SVM) algorithm are:
Select a kernel function: Choose a kernel function for the problem
decision tree, the random forest takes the prediction from each
algorithm:
Assumptions for Random Forest
predict the correct output, while others may not. But together,
all the trees predict the correct output. Therefore, below are two
correlations.
is missing.
Working of Algorithm
diagram:
Step-3: Choose the number N for decision trees that you want to
build.
decision tree, and assign the new data points to the category that
below example:
prediction result, and when a new data point occurs, then based
algorithm.
dimensionality.
overfitting issue.
Regression tasks.
Linear Regression:
predictive analysis.
continuous variables.
learning.
Linear regression shows the linear relationship between the
And if there is more than one input variable, then such linear
algorithm:
the best fit line that means the error between predicted
values and actual values should be minimized. The best fit
calculate the best values for a0 and a1 to find the best fit line,
Cost function
performing.
Hypothesis function.
For Linear Regression, we use the Mean Squared Error
be written as:
Where,
Yi = Actual value
Salary forecasting
technique.
o K-NN algorithm assumes the similarity between the new
case/data and available cases and put the new case into the
problems.
and when it gets new data, then it classifies that data into a
KNN model will find the similar features of the new data
set to the cats and dogs images and based on the most
B, and we have a new data point x1, so this data point will lie in
below algorithm:
neighbors.
Euclidean distance.
image:
difficulties.
o It is simple to implement.
samples.
Decision Trees
Decision Tree is a Supervised learning technique that can be used
In a Decision tree, there are two nodes, which are the Decision
Decision nodes are used to make any decision and have multiple
branches, whereas Leaf nodes are the output of those decisions and do
that can use to predict the class or value of the target variable
data(training data).
The decisions or the test are performed on the basis of features of the
given dataset.
It is called a decision tree because, similar to a tree, it starts with the
like structure.
The logic behind the decision tree can be easily understood because it
Root Node: Root node is from where the decision tree starts. It represents
the entire dataset, which further gets divided into two or more
homogeneous sets.
Leaf Node: Leaf nodes are the final output node, and the tree cannot be
the tree.
Parent/Child node: The root node of the tree is called the parent node,
In a decision tree, for predicting the class of the given dataset, the
This algorithm compares the values of root attribute with the record
For the next node, the algorithm again compares the attribute value
It continues the process until it reaches the leaf node of the tree.
Example: Suppose there is a candidate who has a job offer and wants
Step-3: Assign each data point to their closest centroid, which will
form the predefined K clusters.
Step-4: Calculate the variance and place a new centroid of each cluster.
Step-5: Repeat the third steps, which means reassign each data point
to the new closest centroid of each cluster.
Example:
Let's take number k of clusters, i.e., K=2, to identify the dataset and
to put them into different clusters. It means here we will try to group
these datasets into two different clusters.
Now we will assign each data point of the scatter plot to its closest K-
point or centroid. We will compute it by applying some mathematics
that we have studied to calculate the distance between two points. So,
we will draw a median between both the centroids.
From the above image, it is clear that points left side of the line is near
to the K1 or blue centroid, and points to the right of the line are close to
the yellow centroid. Let's color them as blue and yellow for clear
visualization.
As we got the new centroids so again will draw the median line and
reassign the data points. So, the image will be:
We can see in the above image; there are no dissimilar data points
on either side of the line, which means our model is formed.
Q Learning Algorithm
unknown.
knowledge. The agent learns from both the immediate reward and the
2. Q-Values (Action-Values)
values are updated to reflect better estimates, guiding the agent towards
With this policy, the agent selects a random action with a small
several episodes.
Building a Machine Learning Model
1. Data Collection
Machine learning requires training data, a lot of it. This data can either be
labelled meaning Supervised Learning or not labelled meaning
Unsupervised Learning.
Datasets can be pre-collected from sites like Kaggle, UCI, etc. forms the
basis of you Machine learning project. Data can also be collected through
user-surveys, analysis reports, trends, usage metrics, etc.
2. Data Preparation
Some datasets may not require data preparation at all while for
some data preparation step takes majority of their ML model build
time.
Data can also be randomized, which erases the effects of the particular
order in which we collected and/or otherwise prepared our data.
Later data can be split into training, testing and evaluation sets
The third step consists of selecting the right model. There are many
models which can be used for many different purposes. Once the model is
selected, it needs to meet the specific goal.
Training a model forms the basis of machine learning. The goal is to use
our training data and improve the predictions of our model.
The model is then tested against previously unseen data. The unseen data
is meant to act as representative of model performance in the real world,
but still helps tune the model.
6. Parameter Tuning
7. Make Predictions
Our machine learning model can make predictions ranging from image
recognition to predictive analytics to natural language processing.
After building the model needs to be tested on a testing set to check how
the model performs on unseen data. It helps to further evaluate the model
and provides better approximation.
Likelihood describes how to find the best distribution of the data for
some feature or some situation in the data given a certain value of some
feature or situation, while probability describes how to find the chance
of something given a sample distribution of data.
Consider a dataset containing the weight of the customers. Let’s say the
mean of the data is 70 & the standard deviation is 2.5.
When Probability has to be calculated for any situation using this dataset,
then the mean and standard deviation of the dataset will be constant. Let’s
say the probability of weight > 70 kg has to be calculated for a random
record in the dataset, then the equation will contain weight, mean and
standard deviation.
So MLE will calculate the possibility for each data point in salary and
then by using that possibility, it will calculate the likelihood of those data
points to classify them as either 0 or 1. It will repeat this process of
likelihood until the learner line is best fitted. This process is known as the
maximization of likelihood.
The above explains the scenario, as we can see there is a threshold of 0.5
so if the possibility comes out to be greater than that it is labelled as 1
otherwise 0.
Artificial Neural Networks contain artificial neurons which are called
units. These units are arranged in a series of layers that together
constitute the whole Artificial Neural Network in a system.
A layer can have only a dozen units or millions of units as this
depends on how the complex neural networks will be required to
learn the hidden patterns in the dataset.
Commonly, Artificial Neural Network has an input layer, an output
layer as well as hidden layers.
The input layer receives data from the outside world which the neural
network needs to analyze or learn about. Then this data passes
through one or multiple hidden layers that transform the input into
data that is valuable for the output layer. Finally, the output layer
provides an output in the form of a response of the Artificial Neural
Networks to input data provided.
In the majority of neural networks, units are interconnected from one
layer toanother. Each of these connections has weights that determine
the influence of one unit on another unit. As the data transfers from
one unit to another, the neural network learns more and more about
the data which eventually results in an output from the output layer.
The input layer of an artificial neural network is the first layer, and it
receives input from external sources and releases it to the hidden layer,
which is the second layer. In the hidden layer, each neuron receives input
from the previous layer neurons, computes the weighted sum, and sends it
to the neurons in the next layer.
These connections are weighted means effects of the inputs from the
previous layer are optimized more or less by assigning different weights
to each input and it is adjusted during the training process by optimizing
these weights for improved model performance.
Non-linearclassificationexampleusingNeural Networks:XOR
Among various logical gates, the XOR or also known as the “exclusive
or” problem is one of the logical operations when performed on binary
inputs that yield output for different combinations of input, and for the
same combination of input no output is produced. The outputs generated
by the XOR logic are notlinearlyseparable in the hyperplane.
From the below truth table, it can be inferred that XOR produces an
output for different states of inputs and for thesame inputs the XOR logic
does not produce any output. The Output of XOR logic is yielded by the
equation as shown below.
Thelinearseparabilityofpoints
The input layer receives the raw data features. Each neuron in this
layer corresponds to a specific feature in the input data.
The hidden layer, of which there can be more than one, processes the
data it receives. Hidden layer neurons apply weights, biases and
activation functions.
The output layer produces the final output predictions. Neurons in this
layer represent different possible outputs of the model.
gradient descent helps the system minimize the gap between desired
outputs and actual system outputs. The algorithm tunes the system by
adjusting the weight values for various inputs to narrow the difference
between outputs. This is also known as the error between the two.
Y=mX+c
Where 'm' represents the slope of the line, and 'c' represents the intercepts
on the y-axis.
The slope becomes steeper at the starting point or arbitrary point, but
whenever new parameters are generated, then steepness gradually
reduces, and at the lowest point, it approaches the lowest point, which is
called a point of convergence.
These two factors are used to determine the partial derivative calculation
of future iteration and allow it to the point of convergence or local
minimum or global minimum. Let's discuss learning rate factors in brief;
Learning Rate:
It is defined as the step size taken to reach the minimum or lowest point.
This is typically a small value that is evaluated and updated based on the
behavior of the cost function. If the learning rate is high, it results in
larger steps but also leads to risks of overshooting the minimum. At the
same time, a low learning rate shows the small step sizes, which
compromises overall efficiency but gives the advantage of more
precision.
Batch gradient descent (BGD) is used to find the error for each point in
the training set and update the model after evaluating all training
examples. This procedure is known as the training epoch. In simple
words, it is a greedy approach where we have to sum over all examples
for each update.
Curse of Diamensionality
Curse of Dimensionality arises when working with high-dimensional
data, leading to increased computational complexity, overfitting, and
spurious correlations.
2. Data Preprocessing:
Normalization: Scale the features to a similar range to prevent
certain features from dominating others, especially in distance-based
algorithms.
Handling Missing Values: Address missing data appropriately
through imputation or deletion to ensure robustness in the model
training process.