nayie bayes classifier 21 page
nayie bayes classifier 21 page
Computers are great at working with standardized and structured data like
database tables and financial records. They are able to process that data much faster than
we humans can. But we humans don’t communicate in “structured data” nor do we speak
binary! We communicate using words, a form of unstructured data.
Unfortunately, computers suck at working with unstructured data because there are
no standardized techniques to process it. When we program computers using something
like C++, Java, or Python, we are essentially giving the computer a set of rules that it should
operate by. With unstructured data, these rules are quite abstract and challenging to define
concretely.
Humans have been writing things down for thousands of years. Over that time, our
brain has gained a tremendous amount of experience in understanding natural language.
When we read something written on a piece of paper or in a blog post on the internet, we
understand what that thing really means in the real-world. We feel the emotions that
reading that thing elicits and we often visualize how that thing would look in real life.
That being said, recent advances in Machine Learning (ML) have enabled computers
to do quite a lot of useful things with natural language! Deep Learning has enabled us to
write programs to perform things like language translation, semantic understanding, and
text summarization. All of these things add real-world value, making it easy for you to
understand and perform computations on large blocks of text without the manual effort.
Components of NLP
It involves −
The process of reading and understanding language is far more complex than it
seems at first glance. There are many things that go in to truly understanding what a piece
of text means in the real-world. For example, what do you think the following piece of text
means?
“Steph Curry was on fire last night. He totally destroyed the other team”
To a human it’s probably quite obvious what this sentence means. We know Steph
Curry is a basketball player; or even if you don’t we know that he plays on some kind of
team, probably a sports team. When we see “on fire” and “destroyed” we know that it
means Steph Curry played really well last night and beat the other team.
Computers tend to take things a bit too literally. Viewing things literally like a
computer, we would see “Steph Curry” and based on the capitalisation assume it’s a person,
place, or otherwise important thing which is great! But then we see that Steph Curry “was
on fire”…. A computer might tell you that someone literally lit Steph Curry on fire
yesterday! … Yikes. After that, the computer might say that Mr. Curry has physically
destroyed the other team…. they no longer exist according to this computer… great…
Difficulties in NLU
NLP Terminology
Semantics − It is concerned with the meaning of words and how to combine words
into meaningful phrases and sentences.
Discourse − It deals with how the immediately preceding sentence can affect the
interpretation of the next sentence.
Steps in NLP
Semantic Analysis − It draws the exact meaning or the dictionary meaning from the
text. The text is checked for meaningfulness. It is done by mapping syntactic
structures and objects in the task domain. The semantic analyzer disregards
sentence such as “hot ice-cream”.
Discourse Integration − The meaning of any sentence depends upon the meaning
of the sentence just before it. In addition, it also brings about the meaning of
immediately succeeding sentence.
There are a number of algorithms researchers have developed for syntactic analysis, but
we consider only the following simple methods −
Context-Free Grammar
Top-Down Parser
Generative grammar
A formally specified grammar that can generate all and only the acceptable
sentences of a natural language
Internal structure:
can be bracketed
Context-Free Grammar
It is the grammar that consists of rules with a single symbol on the left-hand side of
the rewrite rules.
lexicon
V -> can
rules
V -> fish
S -> NP VP
NP -> fish
VP -> VP PP
NP -> rivers
VP -> V
VP -> V NP
NP -> pools
VP -> V VP
NP -> NP PP NP -> December
NP -> Scotland
PP -> P NP NP -> it
NP -> they
The parse tree breaks down the sentence into structured parts so that the computer can
easily understand and process it. In order for the parsing algorithm to construct this parse
tree, a set of rewrite rules, which describe what tree structures are legal, need to be
constructed.
These rules say that a certain symbol may be expanded in the tree by a sequence of other
symbols. According to first order logic rule, if there are two strings Noun Phrase (NP) and
Verb Phrase (VP), then the string combined by NP followed by VP is a sentence. The rewrite
rules for the sentence are as follows −
S → NP VP
VP → V NP
Lexocon −
DET → a | the
Terminal symbols
Now consider the above rewrite rules. Since V can be replaced by both, "peck" or "pecks",
sentences such as "The bird peck the grains" can be wrongly permitted. i. e. the subject-
verb agreement error is approved as correct.
Demerits −
For example, “The dog entered my room. It scared me”. If the question asked to the
computer is “who scared you?” the computer should answer dog not It.
To bring out high precision, multiple sets of grammar need to be prepared. It may
require a completely different sets of rules for parsing singular and plural
variations, passive sentences, etc., which can lead to creation of huge set of rules
that are unmanageable.
Top-Down Parser
Here, the parser starts with the S symbol and attempts to rewrite it into a sequence of
terminal symbols that matches the classes of the words in the input sentence until it
consists entirely of terminal symbols.
These are then checked with the input sentence to see if it matched. If not, the process is
started over again with a different set of rules. This is repeated until a specific rule is found
which describes the structure of the sentence.
Demerits −
Neural Networks
Neural networks are parallel computing devices, which are basically an attempt to make a
computer model of the brain. The main objective is to develop a system to perform various
computational tasks faster than the traditional systems. These tasks include pattern recognition
and classification, approximation, optimization, and data clustering.
Machine Learning
Machine Learning is a subset of Artificial Intelligence which provides computers with the
ability to learn without being explicitly programmed. In machine learning, we do not have to define
explicitly all the steps or conditions like any other programming application. On the contrary, the
machine gets trained on a training dataset, large enough to create a model, which helps machine to
take decisions based on its learning.
For example: We want to determine the species of a flower based on its petal and sepal
length (leaves of a flower) using machine learning. Then, how will we do it?
1. We will feed the flower data set which contains various characteristics of different flowers
along with their respective species into our machine as you can see in the above image. Using
this input data set, the machine will create and train a model which can be used to classify
flowers into different categories.
2. Once our model has been trained, we will pass on a set of characteristics as input to the model.
3. Finally, our model will output the species of the flower present in the new input data set. This
process of training a machine to create a model and use it for decision making is
called Machine Learning.
1. Consider a line of 100 yards and you have dropped a coin somewhere on the line. Now, it’s
quite convenient for you to find the coin by simply walking on the line. This very line is a
single dimensional entity.
2. Next, consider you have a square of side 100 yards each as shown in the above image and
yet again, you dropped a coin somewhere in between. Now, it is quite evident that you are
going to take more time to find the coin within that square as compared to the previous
scenario. This square is a 2 dimensional entity.
Mamatha M, SSCASC,Tumkur Page 2
Neural Networks Chapter
8
3. Let’s take it a step ahead by considering a cube of side 100 yards each and you have dropped
a coin somewhere in between. Now, it is even more difficult to find the coin this time. This
cube is a 3 dimensional entity.
Hence, you can observe the complexity is increasing as the dimensions are increasing. And in
real-life, the high dimensional data that we were talking about has thousands of dimensions
that make it very complex to handle and process. The high dimensional data can easily be found in
use-cases like Image processing, NLP, Image Translation etc.
Machine learning was not capable of solving these use-cases and hence, Deep learning came to
the rescue. Deep learning is capable of handling the high dimensional data and is also efficient in
focusing on the right features on its own. This process is called feature extraction.
Biological Neuron
A nerve cell (neuron) is a special biological cell that processes information. According to
estimation, there are huge numbers of neurons, approximately 10 11 with numerous
interconnections, approximately 1015.
Schematic Diagram
Dendrites − They are tree-like branches, responsible for receiving the information from
other neurons it is connected to. In other sense, we can say that they are like the ears of
neuron.
Soma − It is the cell body of the neuron and is responsible for processing of information,
they have received from dendrites.
Axon − It is just like a cable through which neurons send the information as output,
connected to dendrites of other neurons via synapses.
Synapses − It is the connection between the axon and other neuron dendrites. It transfers
the information between neurons (electrical-chemical-electrical).
The brain is principally composed of about 10 billion neurons, each connected to about 10,000
other neurons. Each of the neuronal cell bodies (soma), and the lines are the input and output
channels (dendrites and axons) which connect them.
Each neuron receives electrochemical inputs from other neurons at the dendrites. If the sum
of these electrical inputs is sufficiently powerful to activate the neuron, it transmits an
electrochemical signal along the axon, and passes this signal to the other neurons whose dendrites
are attached at any of the axon terminals. These attached neurons may then fire.
It is important to note that a neuron fires only if the total signal received at the cell body
exceeds a certain level. The neuron either fires or it doesn't, there aren't different grades of firing.
So, our entire brain is composed of these interconnected electro-chemical transmitting
neurons. From a very large number of extremely simple processing units (each performing a
weighted sum of its inputs, and then firing a binary signal if the total input exceeds a certain level)
the brain manages to perform extremely complex tasks.
ANNs are composed of multiple nodes, which imitate biological neurons of human brain.
Every neuron is connected with other neuron through a connection link and they interact with
each other. Each connection link is associated with a weight that has information about the input
signal. This is the most useful information for neurons to solve a particular problem because the
weight usually excites (stimulates) or inhibits (prevents) the signal that is being communicated.
Each neuron has an internal state, which is called an activation signal. The nodes can take input
data and perform simple operations on the data. Output signals called activation or node value,
which are produced after combining the input signals and activation rule, may be sent to other
neurons.
The artificial neuron given in this figure has N input, denoted as X1, X2, ...Xm. Each line
connecting these inputs to the neuron is assigned a weight, which are denoted as W1, W2, .., Wm
respectively. Weights in the artificial model correspond to the synaptic connections in biological
neurons.
The inputs (x) received from the input layer are multiplied with their assigned weights w.
The multiplied values are then added to form the Weighted Sum. The weighted sum of the inputs
and their respective weights are then applied to a relevant Activation Function. The activation
function maps the input to the respective output.
For the above general model of artificial neural network, the net input can be calculated as
follows −
The output can be calculated by applying the activation function over the net input.
If we focus on the structure of a biological neuron, it has dendrite which is used to receive
inputs. These inputs are summed in the cell body and using the Axon it is passed on to the next
biological neuron as shown in the above image.
Similarly, a perceptron receives multiple inputs, applies various transformations and functions
and provides an output.
As we know that our brain consists of multiple connected neurons called neural network, we
can also have a network of artificial neurons called perceptrons to form a Deep neural network.
So, let’s move ahead in this Deep Learning Tutorial to understand how a Deep neural network
looks like.
Example: Consider a scenario where you are to build an Artificial Neural Network (ANN) that
classifies images into two classes:
For example, if the image is composed of 30 by 30 pixels, then the total number of pixels will
be 900. These pixels are represented as matrices, which are then fed into the input layer of the
Neural Network.
Just like how our brains have neurons that help in building and connecting thoughts, an
ANN has perceptrons that accept inputs and process them by passing them on from the input layer
to the hidden and finally the output layer.
As the input is passed from the input layer to the hidden layer, an initial random weight is
assigned to each input. The inputs are then multiplied with their corresponding weights and their
sum is sent as input to the next hidden layer.
Here, a numerical value called bias is assigned to each perceptron, which is associated with
the weightage of each input. Further, each perceptron is passed through activation or a
transformation function that determines whether a particular perceptron gets activated or not.
An activated perceptron is used to transmit data to the next layer. In this manner, the data is
propagated (Forward propagation) through the neural network until the perceptrons reach the
output layer.
At the output layer, a probability is derived which decides whether the data belongs to class
A or class B.
Soma Node
Dendrites Input
Axon Output
Network Topology
Adjustments of Weights or Learning
Network Topology
A network topology is the arrangement of a network along with its nodes and connecting lines.
According to the topology, ANN can be classified as the following kinds −
Feedforward Network
It is a non-recurrent network having processing units/nodes in layers and all the nodes
in a layer are connected with the nodes of the previous layers. The connection has different
weights upon them. There is no feedback loop means the signal can only flow in one direction,
from input to output. They are used in pattern generation/recognition/classification. They
have fixed inputs and outputs. It may be divided into the following two types –
Single layer feedforward network − The concept is of feedforward ANN having only one
weighted layer. In other words, we can say the input layer is fully connected to the output
layer.
Multilayer feedforward network − The concept is of feedforward ANN having more than
one weighted layer. As this network has one or more layers between the input and the
output layer, it is called hidden layers.
Feedback Network
As the name suggests, a feedback network has feedback paths, which means the signal can
flow in both directions using loops. This makes it a non-linear dynamic system, which changes
continuously until it reaches a state of equilibrium. It may be divided into the following types −
Recurrent networks − They are feedback networks with closed loops. Following are the
two types of recurrent networks.
Fully recurrent network − It is the simplest neural network architecture because all nodes
are connected to all other nodes and each node works as both input and output.
Jordan network − It is a closed loop network in which the output will go to the input again
as feedback as shown in the following diagram.
Supervised Learning
As the name suggests, this type of learning is done under the supervision of a teacher. This
learning process is dependent.
During the training of ANN under supervised learning, the input vector is presented to the
network, which will give an output vector. This output vector is compared with the desired output
vector. An error signal is generated, if there is a difference between the actual output and the
desired output vector. On the basis of this error signal, the weights are adjusted until the actual
output is matched with the desired output.
Unsupervised Learning
As the name suggests, this type of learning is done without the supervision of a teacher.
This learning process is independent.
During the training of ANN under unsupervised learning, the input vectors of similar type
are combined to form clusters. When a new input pattern is applied, then the neural network gives
an output response indicating the class to which the input pattern belongs.
There is no feedback from the environment as to what should be the desired output and if it
is correct or incorrect. Hence, in this type of learning, the network itself must discover the
patterns and features from the input data, and the relation for the input data over the output.
Reinforcement Learning
As the name suggests, this type of learning is used to reinforce or strengthen the network
over some critic information. This learning process is similar to supervised learning; however we
might have very less information.
During the training of network under reinforcement learning, the network receives some
feedback from the environment. This makes it somewhat similar to supervised learning. However,
the feedback obtained here is evaluative not instructive, which means there is no teacher as in
supervised learning. After receiving the feedback, the network performs adjustments of the
weights to get better critic information in future.
Bayes Theorem:
Using Bayes theorem, we can find the probability of A happening, given that B has occurred.
Here, B is the evidence and A is the hypothesis. The assumption made here is that the
predictors/features are independent. That is presence of one particular feature does not affect the
other. Hence it is called naive.
A and B are Boolean variables that represent the occurrence of an event.
If the probability of the event is uncertain then the probability is between 0 and 1.
For example
Suppose you are one of the 1/10 people that have a headache (H). i.e, P (H) =1/10.
Suppose 1/40 of people have the flu (F) i.e, P (F) =1/40
Given the fact that you have a headache what are the chances that you have the flu? P (F/H) =?
P(H|F) =conditional probability of x given Flu (often called likelihood of headache given Flu.
What is the probability that a person has a head ache and the Flu?
Bayesian networks
A Bayesian network is a data structure used to represent knowledge in an uncertain domain
(i.e) to represent the dependence between variables and to give a whole specification of the joint
probability distribution.
A Bayesian network is a probabilistic graphical model that represents a set of variables and
their conditional dependencies via a directed acyclic graph (DAG).
A belief network is a graph in which the following holds.
Example 1:
Uncertainty:
I. Mary currently listening to loud music
II. John confuses telephone ring with alarm → laziness and ignorance in the operation
III. Alarm may fail off → power failure, dead battery, cut wires etc
IV.
Burglar Earthquake
V.(B)
(E)
Alarm
(A)
John Mar
y
Conditional Probability Tables
B P(Burglar) E P(Earthquake)
T 0.001 T 0.002
F 0.999 F 0.998
P(Alarm/Burglary, Earthquake)
Burglary Earthquake
True False
T T 0.95 0.05
T F 0.94 0.06
F T 0.29 0.71
F F 0.001 0.999
A P(John=T) P(John=F)
T 0.90 0.10
F 0.05 0.95
A P(Mary=T) P(Mary=F)
T 0.70 0.30
F 0.01 0.99
A knowledge engineer can build a Bayesian network. There are a number of steps the
knowledge engineer needs to take while building it.
Example 2:
Lung cancer. A patient has been suffering from breathlessness. He visits the doctor,
suspecting he has lung cancer. The doctor knows that barring lung cancer, there are various other
possible diseases the patient might have such as tuberculosis and bronchitis.
Is the patient a smoker? If yes, then high chances of cancer and bronchitis.
Is the patient exposed to air pollution? If yes, what sort of air pollution?
Take an X-Ray positive X-ray would indicate either TB or lung cancer.
For now let us consider nodes, with only discrete values. The variable must take on exactly one
of these values at a time.
The Lung-Cancer node has two parents (reasons or causes): Pollution and Smoker, while
node Smoker is an ancestor of node X-Ray. Similarly, X-Ray is a child (consequence or effects) of
node Lung-Cancer and successor of nodes Smoker and Pollution.
First, for each node we need to look at all the possible combinations of values of those
parent nodes. Each such combination is called an instantiation of the parent set. For each distinct
instantiation of parent node values, we need to specify the probability that the child will take.
For example, the Lung-Cancer node’s parents are Pollution and Smoking. They take the
possible values = { (H,T), ( H,F), (L,T), (L,F)}. The CPT specifies the probability of cancer for each of
these cases as <0.05, 0.02, 0.03, 0.001> respectively.
1. Social Media
Facebook
As soon as you upload any photo to Facebook, the service automatically highlights
faces and prompts friends to tag.
Mamatha M, SSCASC,Tumkur Page 17
Neural Networks Chapter
8
Instagram
uses deep learning by making use of a connection of recurrent neural networks to
identify the contextual meaning of an emoji – which has been steadily replacing slangs
(for instance, a laughing emoji could replace “rofl”).
Pinterest
Pinterest uses computer vision – another application of neural networks, where we
teach computers to “see” like a human, in order to automatically identify objects in
images (or “pins”, as they call it) and then recommend visually similar
pins. Other applications of neural networks at Pinterest include spam prevention, search
and discovery, ad performance and monetization, and email marketing.
2. Online Shopping
Search
Your Amazon searches (“earphones”, “pizza stone”, “laptop charger”, etc) return a list
of the most relevant products related to your search, without wasting much time. In a
description of its product search technology, Amazon states that its algorithms learn
automatically to combine multiple relevant features. It uses past patterns and adapts to
what is important for the customer in question.
Recommendations
Amazon shows you recommendations using its “customers who viewed this item also
viewed”, “customers who bought this item also bought”, and also via curated
recommendations on your homepage, on the bottom of the item pages, and through
emails. Amazon makes use of Artificial Neural Networks to train its algorithms to learn
the pattern and behaviour of its users. This, in turn, helps Amazon provide even better
and customized recommendations.
3. Banking/Personal Finance
Cheque Deposits through Mobile
Most large banks are eliminating the need for customers to physically deliver a
cheque to the bank by offering the ability to deposit cheques through a
smartphone application. The technologies that power these applications use Neural
Networks to decipher and convert handwriting on checks into text. Essentially, Neural
Networks find themselves at the core of any application that requires
handwriting/speech/image recognition.
Fraud Prevention
Artificial Intelligence is used to create systems that learn through training what types
of transactions are fraudulent (speak learning, speak Neural Networks!).
4. Image Processing and Character recognition
Character recognition like handwriting has lot of applications in fraud detection (e.g. bank
fraud) and even national security assessments.
Image recognition is an ever-growing field with widespread applications from facial
recognition in social media, cancer detection in medicine, satellite imagery processing for
agricultural and defense usage.
5. Forecasting
Mamatha M, SSCASC,Tumkur Page 18
Neural Networks Chapter
8