Algorithms - Reading Assignment
Algorithms - Reading Assignment
1
Naive Bayes Classifier
Naïve Bayes Classifier is a family of classifiers that uses Bayes Theorem of Probability to build machine
learning models. This classifier is particularly powerful for disease prediction and document
classification. The basic assumption for the Naive Bayes algorithm is that all the features are
considered to be independent of each other.
Bayes theorem gives a way to calculate posterior probability P(A|B) from P(A), P(B), and P(B|A).
P(B|A) × P(A)
P(A|B) =
P(B)
where P(A|B) is the posterior probability of A given B, P(A) is the prior probability, P(B|A) is the
likelihood which is the probability of B given A, and P(B) is the prior probability of B. In simple
English this can be written as:
Likelihood × Prior
Posterior =
Evidence
2
3
K Means Clustering Algorithm
K-means is a popularly used unsupervised ML algorithm for cluster analysis. K-Means is a non-
deterministic and iterative method. The algorithm operates on a given data set through a pre-defined
number of clusters, k. The output of the K Means algorithm is k clusters with input data partitioned
among the clusters. For instance, let’s consider K-Means Clustering for Wikipedia Search results. The
search term “Jaguar” on Wikipedia will return all pages containing the word Jaguar which can refer
to Jaguar as a Car, Jaguar as Mac OS version, and Jaguar as an Animal. K Means clustering algorithm
can be applied to group the web pages that talk about similar concepts. So, the algorithm will group
all the web pages that refer to Jaguar as an Animal into one cluster, Jaguar as a Car into another
cluster, and so on.
For any new incoming data point, the data point is classified according to its proximity to the nearby
classes. Datapoints inside a cluster will exhibit similar characteristics while the other clusters will have
different properties. The primary example of clustering would be grouping the same customers in a
particular class for any marketing campaign, and it is also a practical algorithm for document
clustering.
Let's say we have x1, x2, x3……… x(n) as our inputs, and we want to split this into k clusters.
4
Support Vector Machines
Support Vector Machines are a set of supervised learning methods used for classification, regression
and outliers detection. It organizes the data into different categories by finding a line (hyperplane)
separating the training data set into classes. As there are many such linear hyperplanes, the SVM
algorithm tries to maximize the distance between the various classes involved, referred to as margin
maximization. If the line that maximizes the distance between the classes is identified, the probability
of generalizing well to unseen data is increased.
For a simplest case let us imagine we have two tags: red and blue, and our data has two features: x
and y. We want a classifier that, given a pair of (x, y) coordinates, outputs if it’s either red or blue.
We plot our already labelled training data on a plane:
SVM takes these data points and outputs the hyperplane (which in two dimensions is simply a
line) that best separates the tags. This line is the decision boundary: anything that falls to one side
of it we will classify as blue, and anything that falls to the other as red.
What exactly is the best hyperplane? It is the one that maximizes the margins from both tags. In
other words, the hyperplane whose distance to the nearest element of each tag is the largest.
5
Advantages of Using SVM
1. SVM offers the best classification performance (accuracy) on the training dataset.
2. SVM renders more efficiency for the correct classification of future data.
3. The best thing about SVM is that it does not make strong assumptions about data.
4. It does not overfit the data.
6
Apriori Algorithm
Apriori algorithm is an unsupervised ML algorithm that generates association rules from a given data
set. The Association rule implies that if item A occurs, then item B also occurs with a certain
probability. Most of the association rules generated are in the IF_THEN format. For example, IF
people buy an iPad, they also buy an iPad Case to protect it. For the algorithm to derive such
conclusions, it first observes the number of people who bought an iPad case while purchasing an
iPad. This way a ratio is derived like out of the 100 people who purchased an iPad, 85 people also
purchased an iPad case.
7
Logistic Regression
Logistic regression algorithm is used to estimate discrete values in classification tasks and not
regression problems. The word ‘regression’ here implies that a linear model is fit into the feature
space. This algorithm applies a logistic function to a linear combination of features to predict the
outcome of a categorical dependent variable based on predictor variables. The odds or probabilities
that describe the result of a single trial are modelled as a function of explanatory variables. This
algorithm helps estimate the likelihood of falling into a specific level of the categorical dependent
variable based on the given predictor variables.
Suppose you want to predict if there will be a rainfall tomorrow in Thyolo. Here the prediction
outcome is not a continuous number because there will either be rainfall or no rainfall, so simple
linear regression cannot be applied. Here the outcome variable is one of the several categories, and
logistic regression helps.
8
9
Decision Tree
A decision tree is a graphical representation that makes use of branching methodology to exemplify
all possible outcomes of a decision, based on certain conditions. In a decision tree, the internal node
represents a test on the attribute, each branch of the tree represents the outcome of the test and the
leaf node represents a particular class label i.e. the decision made after computing all of the attributes.
The classification rules are represented through the path from root to the leaf node.
Here is an example:
10
training dataset, the nodes at the top on which the tree is split, are considered as important
variables within a given dataset and feature selection is completed by default.
5. Decision trees help save data preparation time, as they are not sensitive to missing values and
outliers. Missing values will not stop you from splitting the data for building a decision tree.
Outliers will also not affect the decision trees as data splitting happens based on some samples
within the split range and not on exact absolute values.
11
Random Forest
Random Forest is the go-to algorithm that uses a bagging approach to create a bunch of decision
trees with random subset of the data. A model is trained several times on random sample of the
dataset to achieve good prediction performance from the random forest algorithm. In this ensemble
learning method, the output of all the decision trees in the random forest is combined to make the
final prediction. The final prediction of the random forest algorithm is derived by polling the results
of each decision tree or just by going with a prediction that appears the most times in the decision
trees.
12
sensitive to the parameters that are used to run the algorithm. One can easily build a decent
model without much tuning.
5. Random Forest machine learning algorithms can be grown in parallel.
6. This algorithm runs efficiently on large databases.
7. Has higher classification accuracy.
13
Artificial Neural Networks
ANNs consist of input, hidden, and output layers with connected neurons (nodes) to simulate the
human brain. Each connection, like the synapses in a biological brain, can transmit a signal to other
neurons. An artificial neuron receives signals then processes them and can signal neurons connected to
it. The "signal" at a connection is a real number, and the output of each neuron is computed by some
non-linear function of the sum of its inputs. The connections are called edges. Neurons and edges
typically have a weight that adjusts as learning proceeds. The weight increases or decreases the
strength of the signal at a connection. Neurons may have a threshold such that a signal is sent only if
the aggregate signal crosses that threshold. Typically, neurons are aggregated into layers. Different
layers may perform different transformations on their inputs. Signals travel from the first layer (the
input layer), to the last layer (the output layer), possibly after traversing the layers multiple times.
1. The lack of rules for determining the proper network structure means the appropriate artificial
neural network architecture can only be found through trial and error and experience.
2. The requirement of processors with parallel processing abilities makes neural networks
hardware-dependent.
3. The network works with numerical information; therefore, all problems must be translated
into numerical values before they can be presented to the ANN.
14
4. The lack of explanation behind probing solutions is one of the biggest disadvantages in ANNs.
The inability to explain the why or how behind the solution generates a lack of trust in the
network.
Applications of ANNs
• Image recognition
• Chatbots
• Natural language processing, translation and language generation
• Stock market prediction
• Delivery driver route planning and optimization
• Drug discovery and development.
15
K-Nearest Neighbors
KNN is the most straightforward classification algorithm. It is also used for the prediction of
continuous values like regression. Distance-based measures are used in K Nearest Neighbors to get the
correct prediction. The final prediction value is chosen based on the k neighbors. The various distance
measures used are Euclidean, Manhattan, Minkowski, and Hamming distances. The first three are
continuous functions, while Hamming distance is used for categorical variables. Choosing the value of
K is the most essential task in this algorithm. It is often referred to as the lazy learner algorithm. A lazy
learning algorithm is simply an algorithm where the algorithm generalizes the data after a query is
made.
Steps in KNN:
16
Polynomial Regression
Polynomial Regression is a form of Linear regression known as a special case of Multiple linear
regression which estimates the relationship as an nth degree polynomial. Instead of assuming a linear
relation between feature variables X and the target variable y, it uses a polynomial expression to
describe the relationship. The polynomial regression model:
where εi is unobserved random error with mean zero conditioned on a variable xi. β0, β1,… are
unknown parameters/coefficients.
17