DM Algorithms (1)
DM Algorithms (1)
o K-NN algorithm assumes the similarity between the new case/data and available cases and
put the new case into the category that is most similar to the available categories.
o K-NN algorithm stores all the available data and classifies a new data point based on the
similarity. This means when new data appears then it can be easily classified into a well suite
category by using K- NN algorithm.
o K-NN algorithm can be used for Regression as well as for Classification but mostly it is used
for the Classification problems.
o K-NN is a non-parametric algorithm, which means it does not make any assumption on
underlying data.
o It is also called a lazy learner algorithm because it does not learn from the training set
immediately instead it stores the dataset and at the time of classification, it performs an
action on the dataset.
o KNN algorithm at the training phase just stores the dataset and when it gets new data, then
it classifies that data into a category that is much similar to the new data.
o Example: Suppose, we have an image of a creature that looks similar to cat and dog, but we
want to know either it is a cat or dog. So for this identification, we can use the KNN
algorithm, as it works on a similarity measure. Our KNN model will find the similar features
of the new data set to the cats and dogs images and based on the most similar features it
will put it in either cat or dog category.
Suppose there are two categories, i.e., Category A and Category B, and we have a new data point x1,
so this data point will lie in which of these categories. To solve this type of problem, we need a K-NN
algorithm. With the help of K-NN, we can easily identify the category or class of a particular dataset.
Consider the below diagram:
The K-NN working can be explained on the basis of the below algorithm:
o Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
o Step-4: Among these k neighbors, count the number of the data points in each category.
o Step-5: Assign the new data points to that category for which the number of the neighbor is
maximum.
Suppose we have a new data point and we need to put it in the required category. Consider the
below image:
o Firstly, we will choose the number of neighbors, so we will choose the k=5.
o Next, we will calculate the Euclidean distance between the data points. The Euclidean
distance is the distance between two points, which we have already studied in geometry. It
can be calculated as:
o By calculating the Euclidean distance we got the nearest neighbors, as three nearest
neighbors in category A and two nearest neighbors in category B. Consider the below image:
o As we can see the 3 nearest neighbors are from category A, hence this new data point must
belong to category A.
o It is simple to implement.
o Always needs to determine the value of K which may be complex some time.
o The computation cost is high because of calculating the distance between the data points for
all the training samples.
Genetic Algorithm in Machine Learning
A genetic algorithm is an adaptive heuristic search algorithm inspired by "Darwin's theory of
evolution in Nature." It is used to solve optimization problems in machine learning. It is one
of the important algorithms as it helps solve complex problems that would take a long time
to solve.
Genetic Algorithms are being widely used in different real-world applications, for example,
Designing electronic circuits, code-breaking, image processing, and artificial creativity.
Population: Population is the subset of all possible or probable solutions, which can solve
the given problem.
Chromosomes: A chromosome is one of the solutions in the population for the given
problem, and the collection of gene generate a chromosome.
Allele: Allele is the value provided to the gene within a particular chromosome.
Fitness Function: The fitness function is used to determine the individual's fitness level in the
population. It means the ability of an individual to compete with other individuals. In every
iteration, individuals are evaluated based on their fitness function.
Genetic Operators: In a genetic algorithm, the best individual mate to regenerate offspring
better than parents. Here genetic operators play a role in changing the genetic composition
of the next generation.
Selection
After calculating the fitness of every existent in the population, a selection process is used to
determine which of the individualities in the population will get to reproduce and produce
the seed that will form the coming generation.
Event selection
The genetic algorithm works on the evolutionary generational cycle to generate high-quality
solutions. These algorithms use different operations that either enhance or replace the
population to give an improved fit solution.
It basically involves five phases to solve the complex optimization problems, which are given
as below:
o Initialization
o Fitness Assignment
o Selection
o Reproduction
o Termination
1. Initialization
The process of a genetic algorithm starts by generating the set of individuals, which is called
population. Here each individual is the solution for the given problem. An individual contains
or is characterized by a set of parameters called Genes. Genes are combined into a string and
generate chromosomes, which is the solution to the problem. One of the most popular
techniques for initialization is the use of random binary strings.
2. Fitness Assignment
Fitness function is used to determine how fit an individual is? It means the ability of an
individual to compete with other individuals. In every iteration, individuals are evaluated
based on their fitness function. The fitness function provides a fitness score to each
individual. This score further determines the probability of being selected for reproduction.
The high the fitness score, the more chances of getting selected for reproduction.
3. Selection
The selection phase involves the selection of individuals for the reproduction of offspring. All
the selected individuals are then arranged in a pair of two to increase reproduction. Then
these individuals transfer their genes to the next generation.
o Tournament selection
o Rank-based selection
4. Reproduction
After the selection process, the creation of a child occurs in the reproduction step. In this
step, the genetic algorithm uses two variation operators that are applied to the parent
population. The two operators involved in the reproduction phase are given below:
o Crossover: The crossover plays a most significant role in the reproduction phase of the
genetic algorithm. In this process, a crossover point is selected at random within the genes.
Then the crossover operator swaps genetic information of two parents from the current
generation to produce a new individual representing the offspring.
The genes of parents are exchanged among themselves until the crossover point is met.
These newly generated offspring are added to the population. This process is also called or
crossover. Types of crossover styles available:
o Two-point crossover
o Livery crossover
o Mutation
The mutation operator inserts random genes in the offspring (new child) to maintain the
diversity in the population. It can be done by flipping some bits in the chromosomes.
Mutation helps in solving the issue of premature convergence and enhances diversification.
The below image shows the mutation process:
Types of mutation styles available,
o Gaussian mutation
o Exchange/Swap mutation
5. Termination
After the reproduction phase, a stopping criterion is applied as a base for termination. The
algorithm terminates after the threshold fitness solution is reached. It will identify the final
solution as the best solution in the population.
o Genetic algorithms are not efficient algorithms for solving simple problems.
o A search space is the set of all possible solutions to the problem. In the traditional algorithm,
only one set of solutions is maintained, whereas, in a genetic algorithm, several sets of
solutions in search space can be used.
o Traditional algorithms need more information in order to perform a search, whereas genetic
algorithms need only one objective function to calculate the fitness of an individual.
o Traditional Algorithms cannot work parallelly, whereas genetic Algorithms can work parallelly
(calculating the fitness of the individualities are independent).
o One big difference in genetic Algorithms is that rather of operating directly on seeker results,
inheritable algorithms operate on their representations (or rendering), frequently
appertained to as chromosomes.
o One of the big differences between traditional algorithm and genetic algorithm is that it does
not directly operate on candidate solutions.
o Traditional Algorithms can only generate one result in the end, whereas Genetic Algorithms
can generate multiple optimal results from different generations.
o The traditional algorithm is not more likely to generate optimal results, whereas Genetic
algorithms do not guarantee to generate optimal global results, but also there is a great
possibility of getting the optimal result for a problem as it uses genetic operators such as
Crossover and Mutation.
o In a Decision tree, there are two nodes, which are the Decision Node and Leaf
Node. Decision nodes are used to make any decision and have multiple branches, whereas
Leaf nodes are the output of those decisions and do not contain any further branches.
o The decisions or the test are performed on the basis of features of the given dataset.
o It is called a decision tree because, similar to a tree, it starts with the root node, which
expands on further branches and constructs a tree-like structure.
o In order to build a tree, we use the CART algorithm, which stands for Classification and
Regression Tree algorithm.
o A decision tree simply asks a question, and based on the answer (Yes/No), it further split the
tree into subtrees.
There are various algorithms in Machine learning, so choosing the best algorithm for the given
dataset and problem is the main point to remember while creating a machine learning model. Below
are the two reasons for using the Decision tree:
o Decision Trees usually mimic human thinking ability while making a decision, so it is easy to
understand.
o The logic behind the decision tree can be easily understood because it shows a tree-like
structure.
Decision Tree Terminologies
Root Node: Root node is from where the decision tree starts. It represents the entire dataset,
which further gets divided into two or more homogeneous sets.
Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated further after
getting a leaf node.
Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes according
to the given conditions.
Pruning: Pruning is the process of removing the unwanted branches from the tree.
Parent/Child node: The root node of the tree is called the parent node, and other nodes are called
the child nodes.
In a decision tree, for predicting the class of the given dataset, the algorithm starts from the root
node of the tree. This algorithm compares the values of root attribute with the record (real dataset)
attribute and, based on the comparison, follows the branch and jumps to the next node.
o Step-1: Begin the tree with the root node, says S, which contains the complete dataset.
o Step-2: Find the best attribute in the dataset using Attribute Selection Measure (ASM).
o Step-3: Divide the S into subsets that contains possible values for the best attributes.
o Step-4: Generate the decision tree node, which contains the best attribute.
o Step-5: Recursively make new decision trees using the subsets of the dataset created in step
-3. Continue this process until a stage is reached where you cannot further classify the nodes
and called the final node as a leaf node.
Example: Suppose there is a candidate who has a job offer and wants to decide whether he should
accept the offer or Not. So, to solve this problem, the decision tree starts with the root node (Salary
attribute by ASM). The root node splits further into the next decision node (distance from the office)
and one leaf node based on the corresponding labels. The next decision node further gets split into
one decision node (Cab facility) and one leaf node. Finally, the decision node splits into two leaf
nodes (Accepted offers and Declined offer). Consider the below diagram:
Artificial Neural Network Tutorial
Artificial Neural Network Tutorial provides basic and advanced concepts of ANNs. Our Artificial
Neural Network tutorial is developed for beginners as well as professions.
The term "Artificial neural network" refers to a biologically inspired sub-field of artificial intelligence
modeled after the brain. An Artificial neural network is usually a computational network based on
biological neural networks that construct the structure of the human brain. Similar to a human brain
has neurons interconnected to each other, artificial neural networks also have neurons that are
linked to each other in various layers of the networks. These neurons are known as nodes.
Artificial neural network tutorial covers all the aspects related to the artificial neural network. In this
tutorial, we will discuss ANNs, Adaptive resonance theory, Kohonen self-organizing map, Building
blocks, unsupervised learning, Genetic algorithm, etc.
The term "Artificial Neural Network" is derived from Biological neural networks that develop the
structure of a human brain. Similar to the human brain that has neurons interconnected to one
another, artificial neural networks also have neurons that are interconnected to one another in
various layers of the networks. These neurons are known as nodes.
Dendrites Inputs
Synapse Weights
Axon Output
An Artificial Neural Network in the field of Artificial intelligence where it attempts to mimic the
network of neurons makes up a human brain so that computers will have an option to understand
things and make decisions in a human-like manner. The artificial neural network is designed by
programming computers to behave simply like interconnected brain cells.
There are around 1000 billion neurons in the human brain. Each neuron has an association point
somewhere in the range of 1,000 and 100,000. In the human brain, data is stored in such a manner
as to be distributed, and we can extract more than one piece of this data when necessary from our
memory parallelly. We can say that the human brain is made up of incredibly amazing parallel
processors.
We can understand the artificial neural network with an example, consider an example of a digital
logic gate that takes an input and gives an output. "OR" gate, which takes two inputs. If one or both
the inputs are "On," then we get "On" in output. If both the inputs are "Off," then we get "Off" in
output. Here the output depends upon input. Our brain does not perform the same task. The
outputs to inputs relationship keep changing because of the neurons in our brain, which are
"learning."
To understand the concept of the architecture of an artificial neural network, we have to understand
what a neural network consists of. In order to define a neural network that consists of a large
number of artificial neurons, which are termed units arranged in a sequence of layers. Lets us look at
various types of layers available in an artificial neural network.
Input Layer:
As the name suggests, it accepts inputs in several different formats provided by the programmer.
Hidden Layer:
The hidden layer presents in-between input and output layers. It performs all the calculations to find
hidden features and patterns.
Output Layer:
The input goes through a series of transformations using the hidden layer, which finally results in
output that is conveyed using this layer.
The artificial neural network takes input and computes the weighted sum of the inputs and includes
a bias. This computation is represented in the form of a transfer function.
What is Artificial Neural Network
It determines weighted total is passed as an input to an activation function to produce the output.
Activation functions choose whether a node should fire or not. Only those who are fired make it to
the output layer. There are distinctive activation functions available that can be applied upon the sort
of task we are performing.
Artificial neural networks have a numerical value that can perform more than one task
simultaneously.
Data that is used in traditional programming is stored on the whole network, not on a database. The
disappearance of a couple of pieces of data in one place doesn't prevent the network from working.
After ANN training, the information may produce output even with inadequate data. The loss of
performance here relies upon the significance of missing data.
For ANN is to be able to adapt, it is important to determine the examples and to encourage the
network according to the desired output by demonstrating these examples to the network. The
succession of the network is directly proportional to the chosen instances, and if the event can't
appear to the network in all its aspects, it can produce false output.
Extortion of one or more cells of ANN does not prohibit it from generating output, and this feature
makes the network fault-tolerance.
There is no particular guideline for determining the structure of artificial neural networks. The
appropriate network structure is accomplished through experience, trial, and error.
It is the most significant issue of ANN. When ANN produces a testing solution, it does not provide
insight concerning why and how. It decreases trust in the network.
Hardware dependence:
Artificial neural networks need processors with parallel processing power, as per their structure.
Therefore, the realization of the equipment is dependent.
Difficulty of showing the issue to the network:
ANNs can work with numerical data. Problems must be converted into numerical values before being
introduced to ANN. The presentation mechanism to be resolved here will directly impact the
performance of the network. It relies on the user's abilities.
There are various types of Artificial Neural Networks (ANN) depending upon the human brain neuron
and network functions, an artificial neural network similarly performs tasks. The majority of the
artificial neural networks will have some similarities with a more complex biological partner and are
very effective at their expected tasks. For example, segmentation or classification.
Feedback ANN:
In this type of ANN, the output returns into the network to accomplish the best-evolved results
internally. As per the University of Massachusetts, Lowell Centre for Atmospheric Research. The
feedback networks feed information back into itself and are well suited to solve optimization issues.
The Internal system error corrections utilize feedback ANNs.
Feed-Forward ANN:
A feed-forward network is a basic neural network comprising of an input layer, an output layer, and
at least one layer of a neuron. Through assessment of its output by reviewing its input, the intensity
of the network can be noticed based on group behavior of the associated neurons, and the output is
decided. The primary advantage of this network is that it figures out how to evaluate and recognize
input patterns.