Module1 CourseMaterials
Module1 CourseMaterials
Chapter 1
Introduction to Artificial Intelligence
Artificial Intelligence is an approach to make a computer, a robot, or a product to
think how smart humans think. AI is a study of how the human brain thinks, learns,
decides and works, when it tries to solve problems. And finally this study outputs
intelligent software systems. The aim of AI is to improve computer functions which
are related to human knowledge, for example, reasoning, learning, and problem-
solving.
● Reasoning
● Learning
● Problem Solving
● Perception
● Linguistic Intelligence
Definitions of AI
“The exciting new effort to make “The study of Mental faculties through
computers think...machines with minds, the use of computational models”
in the full and literal sense”(Haugeland,
(charniak and MCDermott , 1985)
1985)
“ The study of computations that make it
“[The automation of]activities that we
possible to perceive, reason, and act
associate with human thinking, activities
act” (Winston, 1922)
such as decision-making, problem solving,
learning…”(Bellman, 1978)
“The art of creating machines that “A Field of study that seeks to explain and
perform functions that require emulate intelligent behaviour in terms of
intelligence when performed by people” computational processes” (Schalkoff,
(Kurzw eil, 1990) 1990)
“The study of how to make computers “ The branch of computer science that is
do things at which, at the moment. concerned with the automation of
people are better”(Rich and Knight, 1991) intelligent behaviour”(Luger and
Stubblefield, 1993)
Some definitions of AI. they are organised into four categories as below:
Systems that think like humans. Systems that think rationally.
AI is a multidisciplinary field
Computer
science
Mathematics
&
Statistics <>
Electrical Software
Engineering &Hardware
Advantages of AI:
Few advantages of AI are mentioned below:
Disadvantages of AI:
Few disadvantages of AI are:
As AI is updating every day the hardware and software need to get updated
with time to meet the latest requirements. Machines need repairing and maintenance
which need plenty of costs. Its creation requires huge costs as they are very complex
machines.
AI is making humans lazy with its applications automating the majority of the
work. Humans tend to get addicted to these inventions which can cause a problem to
future generations.
3) Unemployment:
As AI is replacing the majority of the repetitive tasks and other work with
robots,human interference is becoming less which will cause a major problem in the
employment standards. Every organization is looking to replace the minimum
qualified individuals with AI robots which can do similar work with more efficiency.
4) No Emotions:
There is no doubt that machines are much better when it comes to working
efficiently but they cannot replace the human connection that makes the team.
Machines cannot develop a bond with humans which is an essential attribute when it
comes to Team Management.
Machines can perform only those tasks which they are designed or
programmed to do, anything out of that they tend to crash or give irrelevant outputs
which could be a major backdrop.
Applications of AI:
Artificial intelligence has dramatically changed the business landscape. What
started as a rule-based automation is now capable of mimicking human interaction. It
is not just the human-like capabilities that make artificial intelligence unique. An
advanced AI algorithm offers far better speed and reliability at a much lower cost as
compared to its human counterparts.AI has impacted various fields like marketing,
finance, and banking and so on. The various Application domains of AI are:
1. AI In Marketing
2. AI In Banking
3. AI In Finance
4. AI In Agriculture
5. AI In HealthCare
6. AI In Gaming
7. AI In Space Exploration
8. AI In Autonomous Vehicles
9. AI In Chatbots
10.AI In Artificial Creativity
Marketing
Marketing is a way to sugar coat your products to attract more customers. We,
humans, are pretty good at sugar coating, but what if an algorithm or a bot is built
solely for the purpose of marketing a brand or a company? It would do a pretty
awesome job!
In the early 2000s, if we searched an online store to find a product without knowing
it’s exact name, it would become a nightmare to find the product. But now when we
search for an item on any e-commerce store, we get all possible results related to the
item. It’s like these search engines read our minds! In a matter of seconds, we get a
list of all relevant items. An example of this is finding the right movies on Netflix.
Artificial Intelligence Applications – AI in Marketing
One reason why we’re all obsessed with Netflix and chill is because Netflix
provides highly accurate predictive technology based on customer’s reactions to films.
It examines millions of records to suggest shows and films that you might like based
on your previous actions and choices of films. As the data set grows, this technology is
getting smarter and smarter every day.
With the growing advancement in AI, in the near future, it may be possible for
consumers on the web to buy products by snapping a photo of it. Companies like
CamFind and their competitors are experimenting this already.
Banking
AI in banking is growing faster than you thought! A lot of banks have already
adopted AI-based systems to provide customer support, detect anomalies and credit
card frauds. An example of this is HDFC Bank.
HDFC Bank has developed an AI-based chatbot called EVA (Electronic Virtual
Assistant), built by Bengaluru-based Senseforth AI Research.
Since its launch, Eva has addressed over 3 million customer queries, interacted with
over half a million unique users, and held over a million conversations. Eva can collect
knowledge from thousands of sources and provide simple answers in less than 0.4
seconds.
Artificial Intelligence Applications – AI in Banking
The use of AI for fraud prevention is not a new concept. In fact, AI solutions can
be used to enhance security across a number of business sectors, including retail and
finance.
By tracing card usage and endpoint access, security specialists are more
effectively preventing fraud. Organizations rely on AI to trace those steps by analyzing
the behaviors of transactions.
Companies such as MasterCard and RBS WorldPay have relied on AI and Deep
Learning to detect fraudulent transaction patterns and prevent card fraud for years
now. This has saved millions of dollars.
Finance
Ventures have been relying on computers and data scientists to determine
future patterns in the market. Trading mainly depends on the ability to predict the
future accurately.
Machines are great at this because they can crunch a huge amount of data in a
short span. Machines can also learn to observe patterns in past data and predict how
these patterns might repeat in the future.
The new system stores a vast amount of price and trading data in its computer.
By tapping into this reservoir of information, it will make assessments, for example, it
may determine that current market conditions are similar to the conditions two
weeks ago and predict how share prices will be changing a few minutes down the line.
This will help to take better trading decisions based on the predicted market prices.
Agriculture
Here’s an alarming fact, the world will need to produce 50 percent more food
by 2050 because we’re literally eating up everything! The only way this can be
possible is if we use our resources more carefully. With that being said, AI can help
farmers get more from the land while using resources more sustainably.
Issues such as climate change, population growth, and food security concerns
have pushed the industry into seeking more innovative approaches to improve crop
yield.
Organizations are using automation and robotics to help farmers find more
efficient ways to protect their crops from weeds.
Artificial Intelligence Applications – AI in Agriculture
Blue River Technology has developed a robot called See & Spray which uses
computer vision technologies like object detection to monitor and precisely spray
weedicide on cotton plants. Precision spraying can help prevent herbicide resistance.
Apart from this, Berlin-based agricultural tech start-up called PEAT, has
developed an application called Plantix that identifies potential defects and nutrient
deficiencies in the soil through images.
The image recognition app identifies possible defects through images captured
by the user’s smartphone camera. Users are then provided with soil restoration
techniques, tips, and other possible solutions. The company claims that its software
can achieve pattern detection with an estimated accuracy of up to 95%.
HealthCare
When it comes to saving our lives, a lot of organizations and medical care
centers are relying on AI. There are many examples of how AI in healthcare has
helped patients all over the world.
Another such example is Coala life which is a company that has a digitalized
device that can find cardiac diseases.
Similarly, Aifloo is developing a system for keeping track of how people are
doing in nursing homes, home care, etc. The best thing about AI in healthcare is that
you don’t even need to develop a new medication. Just by using an existing
medication in the right way, you can also save lives.
Gaming
Over the past few years, Artificial Intelligence has become an integral part of
the gaming industry. In fact, one of the biggest accomplishments of AI is in the gaming
industry.
The actions taken by the opponent AI are unpredictable because the game is
designed in such a way that the opponents are trained throughout the game and
never repeat the same mistakes. They get better as the game gets harder. This makes
the game very challenging and prompts the players to constantly switch strategies
and never sit in the same position.
Space Exploration
Space expeditions and discoveries always require analyzing vast amounts of data.
Artificial Intelligence and Machine learning is the best way to handle and process data
on this scale. After rigorous research, astronomers used Artificial Intelligence to sift
through years of data obtained by the Kepler telescope in order to identify a distant
eight-planet solar system.
Artificial Intelligence is also being used for NASA’s next rover mission to Mars,
the Mars 2020 Rover. The AEGIS, which is an AI-based Mars rover is already on the
red planet. The rover is responsible for autonomous targeting of cameras in order to
perform investigations on Mars.
Autonomous Vehicles
For the longest time, self-driving cars have been a buzzword in the AI industry.
The developments of autonomous vehicles will definitely revolutionaries the
transport system.
Advanced Deep Learning algorithms can accurately predict what objects in the
vehicle’s vicinity are likely to do. This makes Waymo cars more effective and safer.
Elon Musk talks a ton about how AI is implemented in tesla’s self-driving cars
and autopilot features. He quoted that,
“Tesla will have fully self-driving cars ready by the end of the year and a “robotaxi”
version – one that can ferry passengers without anyone behind the wheel – ready
for the streets next year”.
Chatbots
These days Virtual assistants have become a very common technology. Almost
every household has a virtual assistant that controls the appliances at home. A few
examples include Siri, Cortana, which are gaining popularity because of the user
experience they provide.
Another example is the newly released Google’s virtual assistant called Google
Duplex, that has astonished millions of people. Not only can it respond to calls and
book appointments for you, but it also adds a human touch.
The device uses Natural language processing and machine learning algorithms
to process human language and perform tasks such as manage your schedule, control
your smart home, make a reservation and so on.
Social Media
Ever since social media has become our identity, we’ve been generating an
immeasurable amount of data through chats, tweets, posts and so on. And wherever
there is an abundance of data, AI and Machine Learning are always involved.
In social media platforms like Facebook, AI is used for face verification wherein
machine learning and deep learning concepts are used to detect facial features and
tag your friends. Deep Learning is used to extract every minute detail from an image
by using a bunch of deep neural networks. On the other hand, Machine learning
algorithms are used to design your feed based on your interests.
Another such example is Twitter’s AI, which is being used to identify hate
speech and terroristic language in tweets. It makes use of Machine Learning, Deep
Learning, and Natural language processing to filter out offensive content. The
company discovered and banned 300,000 terrorist-linked accounts, 95% of which
were found by non-human, artificially intelligent machines.
Artificial Creativity
Have you ever wondered what would happen if an artificially intelligent
machine tried to create music and art?
An AI-based system called MuseNet can now compose classical music that
echoes the classical legends, Bach and Mozart.
Tech giants such as Yahoo, Microsoft, Tableau, are using WordSmith to generate
around 1.5 billion pieces of content every year.
Advanced search:
What is a Search Algorithm in AI?
Intermediate State- The states between the starting state and the goal state
that we need to transition to.
Optimal Solution: If a solution has the lowest cost among all solutions.
Search tree: A tree representation of search problems is called Search tree. The
root of the search tree is the root node which is corresponding to the initial
state.
The Solution to a search problem is a sequence of actions, called the plan that
transforms the start state to the goal state.
Following are the four essential properties of search algorithms to compare the
efficiency of these algorithms:
Completeness: A search algorithm is said to be complete if it guarantees to return a
solution if at least any solution exists for any random input.
Optimality: If a solution found for an algorithm is guaranteed to be the best solution
(lowest path cost) among all other solutions, then such a solution is said to be an
optimal solution.
Time Complexity: Time complexity is a measure of time for an algorithm to complete
its task.
Space Complexity: It is the maximum storage space required at any point during the
search, as the complexity of the problem.
Based on the search problems we can classify the search algorithms into
uninformed (Blind search) search and informed search (Heuristic search) algorithms.
Search Algorithms
A* Serch
Depth First Search
Uninformed Search:
Uninformed search is used when there is no information about the cost of navigating
between states.s
The uninformed search does not contain any domain knowledge such as closeness,
the location of the goal. It operates in a brute-force way as it only includes
information about how to traverse the tree and how to identify leaf and goal nodes.
Uninformed search applies a way in which search trees are searched without any
information about the search space like initial state operators and test for the goal, so
it is also called blind search. It examines each node of the tree until it achieves the
goal node.
Informed Search:
An informed search is used when we know the cost or have a solid estimate of the
cost between states.
Informed search algorithms use domain knowledge. In an informed search,
problem information is available which can guide the search. Informed search
strategies can find a solution more efficiently than an uninformed search strategy.
Informed search is also called a Heuristic search.
A heuristic is a way which might not always be guaranteed for best solutions but
guaranteed to find a good solution in a reasonable time.
Uninformed Search Algorithms
Uninformed search is a class of general-purpose search algorithms which
operates in brute force-way. Uninformed search algorithms do not have additional
information about state or search space other than how to traverse the tree, so it is
also called blind search.
Breadth-first Search:
● Breadth-first search is the most common search strategy for traversing a tree or
graph. This algorithm searches breadthwise in a tree or graph, so it is called
breadth-first search.
● BFS algorithm starts searching from the root node of the tree and expands all
successor nodes at the current level before moving to nodes of next level.
● The breadth-first search algorithm is an example of a general-graph search
algorithm.
● Breadth-first search implemented using FIFO queue data structure.
Advantages:
● It requires lots of memory since each level of the tree must be saved into
memory to expand the next level.
● BFS needs lots of time if the solution is far away from the root node.
Example:
In the below tree structure, we have shown the traversing of the tree using BFS
algorithm from the root node S to goal node K. BFS search algorithm traverse in
layers, so it will follow the path which is shown by the dotted arrow, and the
traversed path will be:
1. S---> A--->B---->C--->D---->G--->H--->E---->F---->I---->K
Time Complexity: Time Complexity of BFS algorithm can be obtained by the number
of nodes traversed in BFS until the shallowest Node. Where the d= depth of
shallowest solution and b is a node at every state.
T (b) = 1+b2+b3+.......+ bd= O (bd)
Space Complexity: Space complexity of BFS algorithm is given by the Memory size of
the frontier which is O(bd).
Completeness: BFS is complete, which means if the shallowest goal node is at some
finite depth, then BFS will find a solution.
Optimality: BFS is optimal if path cost is a non-decreasing function of the depth of the
node.
Depth-first Search
● Depth-first search is a recursive algorithm for traversing a tree or graph data
structure.
● It is called the depth-first search because it starts from the root node and
follows each path to its greatest depth node before moving to the next path.
● DFS uses a stack data structure for its implementation.
● The process of the DFS algorithm is similar to the BFS algorithm.
Note: Backtracking is an algorithm technique for finding all possible solutions using
recursion.
Advantages:
● DFS requires very less memory as it only needs to store a stack of the nodes on
the path from root node to the current node.
● It takes less time to reach the goal node than the BFS algorithm (if it traverses
in the right path).
Disadvantages:
● There is the possibility that many states keep reoccurring, and there is no
guarantee of finding the solution.
● DFS algorithm goes for deep down searching and sometimes it may go to the
infinite loop.
Example:
In the below search tree, we have shown the flow of depth-first search, and it will
follow the order as:
Root node--->Left node ----> right node.
It will start searching from root node S, and traverse A, then B, then D and E, after
traversing E, it will backtrack the tree as E has no other successor and still the goal
node is not found. After backtracking it will traverse node C and then G, and here it
will terminate as it found the goal node.
Completeness: DFS search algorithm is complete within finite state space as it will
expand every node within a limited search tree.
Time Complexity: Time complexity of DFS will be equivalent to the node traversed by
the algorithm. It is given by:
T(n)= 1+ n2+ n3 +.........+ nm=O(nm)
Where, m= maximum depth of any node and this can be much larger than d
(Shallowest solution depth)
Space Complexity: DFS algorithm needs to store only a single path from the root
node, hence space complexity of DFS is equivalent to the size of the fringe set, which
is O(bm).
Optimal: DFS search algorithm is non-optimal, as it may generate a large number of
steps or high cost to reach to the goal node.
● Uniform cost search is optimal because at every state the path with the least
cost is chosen.
Disadvantages:
● It does not care about the number of steps involve in searching and only
concerned about path cost. Due to which this algorithm may be stuck in an
infinite loop.
Example:
Completeness:
Uniform-cost search is complete, such as if there is a solution, UCS will find it.
Time Complexity:
Let C* is Cost of the optimal solution, and ε is each step to get closer to the goal
node. Then the number of steps is = C*/ε+1. Here we have taken +1, as we start from
state 0 and end to C*/ε.
Hence, the worst-case time complexity of Uniform-cost search isO(b1 + [C*/ε])/.
Space Complexity:
The same logic is for space complexity so, the worst-case space complexity of
Uniform-cost search is O(b1 + [C*/ε]).
Optimal:
Uniform-cost search is always optimal as it only selects a path with the lowest path
cost.
Here h(n) is heuristic cost, and h*(n) is the estimated cost. Hence heuristic cost should
be less than or equal to the estimated cost.
Advantages:
● Best first search can switch between BFS and DFS by gaining the advantages of
both the algorithms.
● This algorithm is more efficient than BFS and DFS algorithms.
Disadvantages:
In this search example, we are using two lists which are OPEN and CLOSED
Lists. Following are the iterations for traversing the above example.
Expand the nodes of S and put in the CLOSED list
Initialization: Open [A, B], Closed [S]
Iteration 1: Open [A], Closed [S, B]
Iteration 2: Open [E, F, A], Closed [S, B]
: Open [E, A], Closed [S, B, F]
Iteration 3: Open [I, G, E, A], Closed [S, B, F]
: Open [I, E, A], Closed [S, B, F, G]
Hence the final solution path will be: S----> B----->F----> G
Time Complexity: The worst case time complexity of Greedy best first search is O(b m).
Space Complexity: The worst case space complexity of Greedy best first search is
O(bm). Where, m is the maximum depth of the search space.
Complete: Greedy best-first search is also incomplete, even if the given state space is
finite.
Optimal: Greedy best first search algorithm is not optimal.
A* Search Algorithm:
A* search is the most commonly known form of best-first search. It uses
heuristic function h(n), and cost to reach the node n from the start state g(n). It has
combined features of UCS and greedy best-first search, by which it solves the problem
efficiently. A* search algorithm finds the shortest path through the search space using
the heuristic function. This search algorithm expands less search trees and provides
optimal results faster. A* algorithm is similar to UCS except that it uses g(n)+h(n)
instead of g(n).
In A* search algorithm, we use search heuristic as well as the cost to reach the node.
Hence we can combine both costs as follows, and this sum is called as a fitness
number.
At each point in the search space, only those nodes are expanded which have the
lowest value of f(n), and the algorithm terminates when the goal node is found.
Algorithm of A* search:
Step1: Place the starting node in the OPEN list.
Step 2: Check if the OPEN list is empty or not, if the list is empty then return failure
and stops.
Step 3: Select the node from the OPEN list which has the smallest value of evaluation
function (g+h), if node n is goal node then return success and stop, otherwise
Step 4: Expand node n and generate all of its successors, and put n into the closed list.
For each successor n', check whether n' is already in the OPEN or CLOSED list, if not
then compute the evaluation function for n' and place it into the Open list.
Step 5: Else if node n' is already in OPEN and CLOSED, then it should be attached to
the back pointer which reflects the lowest g(n') value.
Step 6: Return to Step 2.
Advantages:
Disadvantages:
● It does not always produce the shortest path as it is mostly based on heuristics
and approximation.
● A* Search algorithm has some complexity issues.
● The main drawback of A* is memory requirement as it keeps all generated
nodes in the memory, so it is not practical for various large-scale problems.
Example:
In this example, we will traverse the given graph using the A* algorithm. The
heuristic value of all states is given in the below table so we will calculate the f(n) of
each state using the formula f(n)= g(n) + h(n), where g(n) is the cost to reach any node
from start state.
Here we will use OPEN and CLOSED list.
Solution:
Initialization: {(S, 5)}
Iteration1: {(S--> A, 4), (S-->G, 10)}
Iteration2: {(S--> A-->C, 4), (S--> A-->B, 7), (S-->G, 10)}
Iteration3: {(S--> A-->C--->G, 6), (S--> A-->C--->D, 11), (S--> A-->B, 7), (S-->G, 10)}
Iteration 4 will give the final result, as S--->A--->C--->G it provides the optimal path
with cost 6.
Points to remember:
● A* algorithm returns the path which occurred first, and it does not search for all
remaining paths.
● The efficiency of A* algorithm depends on the quality of heuristic.
● A* algorithm expands all nodes which satisfy the condition f(n)
Complete: A* algorithm is complete as long as:
● Admissible: the first condition requires for optimality is that h(n) should be an
admissible heuristic for A* tree search. An admissible heuristic is optimistic in
nature.
● Consistency: Second required condition is consistency for only A* graph-search.
If the heuristic function is admissible, then A* tree search will always find the least
cost path.
Time Complexity: The time complexity of A* search algorithm depends on heuristic
function, and the number of nodes expanded is exponential to the depth of solution
d. So the time complexity is O(b^d), where b is the branching factor.
Space Complexity: The space complexity of A* search algorithm is O(b^d)
Definition:
A constraint satisfaction problem (CSP) is a problem that requires its solution within
some limitations or conditions also known as constraints. It consists of the following:
● A finite set of variables which stores the solution (V = {V1, V2, V3,....., Vn})
● A set of discrete values known as domain from which the solution is picked (D =
{D1, D2, D3,.....,Dn})
● A finite set of constraints (C = {C1, C2, C3,......, Cn})
Please note, that the elements in the domain can be both continuous and discrete but
in AI, we generally only deal with discrete values.
Also, note that all these sets should be finite except for the domain set. Each variable
in the variable set can have different domains. For example, consider the Sudoku
problem again. Suppose that a row, column and block already have 3, 5 and 7 filled in.
Then the domain for all the variables in that row, column and block will be {1, 2, 4, 6,
8, 9}.
In order to solve the complex problems encountered in AI, one generally needs a
large amount of knowledge, and suitable mechanisms for representing and
manipulating all that knowledge. Knowledge can take many forms.
It is raining
So, how should an AI agent store and manipulate knowledge like this?
What is Knowledge Representation?
Knowledge Representation in AI describes the representation of knowledge.
Basically, it is a study of how the beliefs, intentions, and judgments of an intelligent
agent can be expressed suitably for automated reasoning. One of the primary
purposes of Knowledge Representation includes modelling intelligent behaviour for
an agent.
● Object: All the facts about objects in our world domain. E.g., Guitars contain
strings, trumpets are brass instruments.
● Events: Events are the actions which occur in our world.
● Performance: It describes behaviour which involves knowledge about how to
do things.
● Meta-knowledge: It is knowledge about what we know.
● Facts: Facts are the truths about the real world and what we represent.
● Knowledge-Base: The central component of the knowledge-based agents is the
knowledge base. It is represented as KB. The Knowledgebase is a group of the
Sentences (Here, sentences are used as a technical term and not identical with
the English language).
Different Types of Knowledge
There are 5 types of Knowledge such as:
● Declarative Knowledge – It includes concepts, facts, and objects and expressed
in a declarative sentence.
● Structural Knowledge – It is a basic problem-solving knowledge that describes
the relationship between concepts and objects.
● Procedural Knowledge – This is responsible for knowing how to do something
and includes rules, strategies, procedures, etc.
● Meta Knowledge – Meta Knowledge defines knowledge about other types of
Knowledge.
● Heuristic Knowledge – This represents some expert knowledge in the field or
subject.
● Perception
● Learning
● Knowledge Representation & Reasoning
● Planning
● Execution
Here is an example to show the different components of the system and how it
works:
Example
The above diagram shows the interaction of an AI system with the real world and the
components involved in showing intelligence.
In this example, there is one decision-maker whose actions are justified by sensing the
environment and using knowledge. But, if we remove the knowledge part here, it will
not be able to display any intelligent behavior.
Techniques of Knowledge Representation in AI
There are four techniques of representing knowledge such as:
Logical Representation
Logical representation is a language with some definite rules which deal with
propositions and has no ambiguity in representation. It represents a conclusion based
on various conditions and lays down some important communication rules. Also, it
consists of precisely defined syntax and semantics which supports the sound
inference. Each sentence can be translated into logics using syntax and semantics.
Semantics
Syntax
● It decides how we can construct legal ● Semantics are the rules by which we
sentences in logic. can interpret the sentence in the
● It determines which symbol we can logic.
use in knowledge representation. ● It assigns a meaning to each
● Also, how to write those symbols. sentence.
Advantages:
● Logical representation helps to perform logical reasoning.
● This representation is the basis for the programming languages.
Disadvantages:
● Logical representations have some restrictions and are challenging to work
with.
● This technique may not be very natural, and inference may not be very
efficient.
Advantages:
Frame Representation
A frame is a record like structure that consists of a collection of attributes and values
to describe an entity in the world. These are the AI data structures that divide
knowledge into substructures by representing stereotypes situations. Basically, it
consists of a collection of slots and slot values of any type and size. Slots have names
and values which are called facets.
Advantages:
Disadvantages:
Production Rules
In production rules, the agent checks for the condition and if the condition exists then
production rule fires and corresponding action is carried out. The condition part of the
rule determines which rule may be applied to a problem. Whereas, the action part
carries out the associated problem-solving steps. This complete process is called a
recognize-act cycle.
The production rules system consists of three main parts:
● The set of production rules
● Working Memory
● The recognize-act-cycle
Advantages:
Disadvantages:
● It does not exhibit any learning capabilities and does not store the result of the
problem for future uses.
● During the execution of the program, many rules may be active. Thus, rule-
based production systems are inefficient.
Representation Requirements
A good knowledge representation system must have properties such as:
Example:
Age Emp ID
Name
John 25 100071
Amanda 23 100056
Sam 27 100042
2. Inheritable Knowledge
In the inheritable knowledge approach, all data must be stored into a hierarchy of
classes and should be arranged in a generalized form or a hierarchical manner. Also,
this approach contains inheritable knowledge which shows a relation between
instance and class, and it is called instance relation. In this approach, objects and
values are represented in Boxed nodes.
Example:
3. Inferential Knowledge
The inferential knowledge approach represents knowledge in the form of formal logic.
Thus, it can be used to derive more facts. Also, it guarantees correctness.
Example:
Cricketer(John)
Procedural knowledge:
1. Representational
Accuracy:
Knowledge Representation system should have the ability to represent
all kinds of required knowledge.
2. Inferential
Adequacy:
Knowledge Representation system should have the ability to manipulate
the representational structures to produce new knowledge
corresponding to existing structures.
3. Inferential
Efficiency:
The ability to direct the inferential knowledge mechanism into the most
productive directions by storing appropriate guides.
4. Acquisitional
efficiency: The ability to acquire new knowledge easily using automatic
methods.
Challenges/Issues in Knowledge Representation
Reasoning:
The reasoning is the mental process of deriving logical conclusions and making
predictions from available knowledge, facts, and beliefs. Or we can say, "Reasoning is
a way to infer facts from existing data." It is a general process of thinking rationally,
to find valid conclusions.
In artificial intelligence, the reasoning is essential so that the machine can also think
rationally as a human brain, and can perform like a human.
Types of Reasoning
In artificial intelligence, reasoning can be divided into the following categories:
● Deductive reasoning
● Inductive reasoning
● Abductive reasoning
● Common Sense Reasoning
● Monotonic Reasoning
● Non-monotonic Reasoning
Note: Inductive and deductive reasoning are the forms of propositional logic.
1. Deductive reasoning:
Deductive reasoning is deducing new information from logically related known
information. It is the form of valid reasoning, which means the argument's conclusion
must be true when the premises are true.
Deductive reasoning is a type of propositional logic in AI, and it requires various rules
and facts. It is sometimes referred to as top-down reasoning, and contradictory to
inductive reasoning.
In deductive reasoning, the truth of the premises guarantees the truth of the
conclusion.
Deductive reasoning mostly starts from the general premises to the specific
conclusion, which can be explained as below example.
Example:
2. Inductive Reasoning:
Inductive reasoning is a form of reasoning to arrive at a conclusion using limited sets
of facts by the process of generalization. It starts with the series of specific facts or
data and reaches to a general statement or conclusion.
Inductive reasoning is a type of propositional logic, which is also known as cause-
effect reasoning or bottom-up reasoning.
In inductive reasoning, we use historical data or various premises to generate a
generic rule, for which premises support the conclusion.
In inductive reasoning, premises provide probable supports to the conclusion, so the
truth of premises does not guarantee the truth of the conclusion.
Example:
Premise: All of the pigeons we have seen in the zoo are white.
Conclusion: Therefore, we can expect all the pigeons to be white.
3. Abductive reasoning:
Abductive reasoning is a form of logical reasoning which starts with single or multiple
observations then seeks to find the most likely explanation or conclusion for the
observation.
Abductive reasoning is an extension of deductive reasoning, but in abductive
reasoning, the premises do not guarantee the conclusion.
Example:
Example:
Example:
● Bayes' rule
● Bayesian Statistics
Note: We will learn the above two rules in later chapters.
As probabilistic reasoning uses probability and related terms, so before understanding
probabilistic reasoning, let's understand some common terms:
Probability: Probability can be defined as a chance that an uncertain event will occur.
It is the numerical measure of the likelihood that an event will occur. The value of
probability always remains between 0 and 1 that represent ideal uncertainties.
1. 0 ≤ P(A) ≤ 1, where P(A) is the probability of an event A.
1. P(A) = 0, indicates total uncertainty in an event A.
1. P(A) =1, indicates total certainty in an event A.
We can find the probability of an uncertain event by using the below formula.
It can be explained by using the below Venn diagram, where B is occurred event, so
sample space will be reduced to set B, and now we can only calculate event A when
event B is already occurred by dividing the probability of P(A⋀B) by P( B ).
Example:
In a class, there are 70% of the students who like English and 40% of the students who
like English and mathematics, and then what is the percentage of students those who
like English also like mathematics?
Solution:
Let, A is an event that a student likes Mathematics
B is an event that a student likes English.
Hence, 57% are the students who like English also like Mathematics.
Chapter 2
Introduction to Machine Learning
The term Machine Learning was first coined by Arthur Samuel in the year 1959.
Looking back, that year was probably the most significant in terms of technological
advancements.
If you browse through the net about ‘what is Machine Learning’, you’ll get at least
100 different definitions. However, the very first formal definition was given by
Tom M. Mitchell:
But wait, can a machine think or make decisions? Well, if you feed a machine a
good amount of data, it will learn how to interpret process and analyze this data
by using Machine Learning Algorithms, in order to solve real-world problems.
Before moving any further, let’s discuss some of the most commonly used
terminologies in Machine Learning.
Predictor Variable: It is a feature(s) of the data that can be used to predict the
output.
The Machine Learning process involves building a Predictive model that can be
used to find a solution for a Problem Statement. To understand the Machine
Learning process let’s assume that you have been given a problem that needs to
be solved by using Machine Learning.
The problem is to predict the occurrence of rain in your local area by using Machine
Learning.
The data you collected is almost never in the right format. You will encounter a lot
of inconsistencies in the data set such as missing values, redundant variables,
duplicate values, etc. Removing such inconsistencies is very essential because they
might lead to wrongful computations and predictions. Therefore, at this stage, you
scan the data set for any inconsistencies and you fix them then and there.
Grab your detective glasses because this stage is all about diving deep into data
and finding all the hidden data mysteries. EDA or Exploratory Data Analysis is the
brainstorming stage of Machine Learning. Data Exploration involves understanding
the patterns and trends in the data. At this stage, all the useful insights are drawn
and correlations between the variables are understood.
For example, in the case of predicting rainfall, we know that there is a strong
possibility of rain if the temperature has fallen low. Such correlations must be
understood and mapped at this stage.
All the insights and patterns derived during Data Exploration are used to build the
Machine Learning Model. This stage always begins by splitting the data set into
two parts, training data, and testing data. The training data will be used to build
and analyze the model. The logic of the model is based on the Machine Learning
Algorithm that is being implemented.
In the case of predicting rainfall, since the output will be in the form of True (if it
will rain tomorrow) or False (no rain tomorrow), we can use a Classification
Algorithm such as Logistic Regression.
Choosing the right algorithm depends on the type of problem you’re trying to
solve, the data set and the level of complexity of the problem. In the upcoming
sections, we will discuss the different types of problems that can be solved by
using Machine Learning.
After building a model by using the training data set, it is finally time to put the
model to a test. The testing data set is used to check the efficiency of the model
and how accurately it can predict the outcome. Once the accuracy is calculated,
any further improvements in the model can be implemented at this stage.
Methods like parameter tuning and cross-validation can be used to improve the
performance of the model.
Step 7: Predictions
Once the model is evaluated and improved, it is finally used to make predictions.
The final output can be a Categorical variable (eg. True or False) or it can be a
Continuous Quantity (eg. the predicted value of a stock).
In our case, for predicting the occurrence of rainfall, the output will be a
categorical variable.
So that was the entire Machine Learning process. Now it’s time to learn about the
different ways in which Machines can learn.
A machine can learn to solve a problem by following any one of the following three
approaches. These are the ways in which a machine can learn:
1. Supervised Learning
2. Unsupervised Learning
3. Reinforcement Learning
Supervised:
“Supervised learning is a technique in which we teach or train the machine using
data which is well labeled.”
Unsupervised:
“Unsupervised learning involves training by using unlabeled data and allowing
the model to act on that information without guidance. “
Think of unsupervised learning as a smart kid that learns without any guidance.
In this type of Machine Learning, the model is not fed with labeled data, as in the
model has no clue that ‘this image is Tom and this is Jerry’, it figures out patterns
and the differences between Tom and Jerry on its own by taking in tons of data.
Unsupervised Learning – Introduction To Machine Learning
For example, it identifies prominent features of Tom such as pointy ears, bigger size,
etc, to understand that this image is of type 1. Similarly, it finds such features in Jerry
and knows that this image is of type 2. Therefore, it classifies the images into two
different classes without knowing who Tom is or Jerry is.
Clustering: A clustering problem is where you want to discover the inherent groupings
in the data, such as grouping customers by purchasing behavior.
Association: An association rule learning problem is where you want to discover rules
that describe large portions of your data, such as people that buy X also tend to buy Y.
Semi-Supervised Learning:
the algorithm is trained upon a combination of labeled and unlabeled data. Typically,
this combination will contain a very small amount of labeled data and a very large
amount of unlabeled data. The basic procedure involved is that first, the programmer
will cluster similar data using an unsupervised learning algorithm and then use the
existing labeled data to label the rest of the unlabeled data. The typical use cases of
such type of algorithm have a common property among them – The acquisition of
unlabeled data is relatively cheap while labeling the said data is very expensive.
Intuitively, one may imagine the three types of learning algorithms as Supervised
learning where a student is under the supervision of a teacher at both home and
school, Unsupervised learning where a student has to figure out a concept himself
and Semi-Supervised learning where a teacher teaches a few concepts in class and
gives questions as homework which are based on similar concepts.
1. Continuity Assumption: The algorithm assumes that the points which are closer
to each other are more likely to have the same output label.
2. Cluster Assumption: The data can be divided into discrete clusters and points in
the same cluster are more likely to share an output label.
Deep Learning:
What is Deep Learning?
In human brain approximately 100 billion neurons all together this is a picture
of an individual neuron and each neuron is connected through thousand of their
neighbours.
The question here is how do we recreate these neurons in a computer. So, we create
an artificial structure called an artificial neural net where we have nodes or neurons.
We have some neurons for input value and some for output value and in between,
there may be lots of neurons interconnected in the hidden layer.
Architectures :
Advantages:
1. Best in-class performance on problems.
2. Reduces need for feature engineering.
3. Eliminates unnecessary costs.
4. Identifies defects easily that are difficult to detect.
Disadvantages:
1. Large amount of data required.
2. Computationally expensive to train.
3. No strong theoretical foundation.
Applications:
1. Automatic Text Generation – Corpus of text is learned and from this model
new text is generated, word-by-word or character-by-character.
Then this model is capable of learning how to spell, punctuate, form sentences,
or it may even capture the style.
2. Healthcare – Helps in diagnosing various diseases and treating it.
3. Automatic Machine Translation – Certain words, sentences or phrases in one
language is transformed into another language (Deep Learning is achieving top
results in the areas of text, images).
4. Image Recognition – Recognizes and identifies peoples and objects in images as
well as to understand content and context. This area is already being used in
Gaming, Retail, Tourism, etc.
5. Predicting Earthquakes – Teaches a computer to perform viscoelastic
computations which are used in predicting earthquakes.
Reinforcement Learning:
A reinforcement learning algorithm, or agent, learns by interacting with its
environment. The agent receives rewards by performing correctly and penalties for
performing incorrectly. The agent learns without intervention from a human by
maximizing its reward and minimizing its penalty. It is a type of dynamic programming
that trains algorithms using a system of reward and punishment.
In the above example, we can see that the agent is given 2 options i.e. a path
with water or a path with fire. A reinforcement algorithm works on reward a system
i.e. if the agent uses the fire path then the rewards are subtracted and agent tries to
learn that it should avoid the fire path. If it had chosen the water path or the safe
path then some points would have been added to the reward points, the agent then
would try to learn what path is safe and what path isn’t.
Here’s a table that sums up the difference between Regression, Classification, and
Clustering.
Where,
N=Total number of observation
Yi = Actual value
(a1xi+a0)= Predicted value.
Residuals: The distance between the actual value and predicted values is called
residual. If the observed points are far from the regression line, then the residual will
be high, and so cost function will high. If the scatter points are close to the regression
line, then the residual will be small and hence the cost function.
Gradient Descent:
o Gradient descent is used to minimize the MSE by calculating the gradient of the
cost function.
o A regression model uses gradient descent to update the coefficients of the line
by reducing the cost function.
o It is done by a random selection of values of coefficient and then iteratively
update the values to reach the minimum cost function.
Model Performance:
The Goodness of fit determines how the line of regression fits the set of observations.
The process of finding the best model out of various models is called optimization. It
can be achieved by below method:
1. R-squared method:
o R-squared is a statistical method that determines the goodness of fit.
o It measures the strength of the relationship between the dependent and
independent variables on a scale of 0-100%.
o The high value of R-square determines the less difference between the
predicted values and actual values and hence represents a good model.
o It is also called a coefficient of determination, or coefficient of multiple
determination for multiple regression.
o It can be calculated from the below formula:
Advantages Disadvantages
Linear regression performs
The assumption of linearity between
exceptionally well for linearly
dependent and independent variables
separable data
Easier to implement, interpret and
It is often quite prone to noise and over fitting
efficient to train
It handles overfitting pretty well using
dimensionally reduction techniques, Linear regression is quite sensitive to outliers
regularization, and cross-validation
One more advantage is the
extrapolation beyond a specific data It is prone to multicollinearity
set
Chapter 3
Introduction to Natural Language Processing
Language is a method of communication with the help of which we can speak, read
and write. For example, we think, we make decisions, plans and more in natural
language; precisely, in words. However, the big question that confronts us in this AI
era is that can we communicate in a similar manner with computers. In other words,
can human beings communicate with computers in their natural language?
As early as the 1950s computer scientists began attempts at using software to process
and analyze textual components, sentiment, parts of speech, and the various entities
that make up a body of text. Until relatively recently, processing and analyzing
language has been quite a challenge.
Ever since IBM’s Watson won on the game show Jeopardy! , the promise of machines
being able to understand language has slowly edged closer.
Natural language processing is essentially the ability to take a body of text and extract
meaning from it using a computer.
I drove my friend Mary to the park in my Tesla while listening to music on my IPhone.
For a human reader, this is an easily understandable sentence and paints a clear
picture of what’s happening. But for a computer, not so much. For a machine, the
sentence would need to be broken down into its structured parts. Instead of an entire
sentence, the computer would need to see both the individual parts and entities
along with the relations between these entities.
Humans understand that Mary is a friend and that a Tesla is likely a car. Since we have
the context of bringing our friend along with us, we intuitively rule out that we’re
driving something else, like a bicycle. Additionally, after many years of popularity and
cultural references, we all know that an iPhone is a smartphone.
None of the above is understood by a computer without assistance. Now let’s take a
look at how that sentence could be written as structured data from the outset. If
developers had made time in advance to structure the data in our sentence, in
XML you’d see the following entities:
<friend>Mary</friend>
<car>Tesla</car>
<phone>iPhone</phone>
But obviously, this can’t happen on the fly without assistance. As mentioned
previously, we have significantly more unstructured data than structured. And unless
time is taken to apply the correct structure to the text in advance, we have a massive
problem that needs solving. This is where NLP enters the picture.
Natural language processing is needed when you wish to mine unstructured data and
extract meaningful insight from text. General applications of NLP attempt to identify
common entities from a body of text; but when you start working with domain-
specific content, a custom model needs training.
The Components of NLP
In order to understand NLP, we first need to understand the components of its model.
Specifically, natural language processing lets you analyze and extract key metadata
from text, including entities, relations, concepts, sentiment, and emotion.
Let’s briefly discuss each of these aspects that can be extracted from a body of text.
Entities
Likely the most common use case for natural language processing, entities are the
people, places, organizations, and things in your text. In our initial example sentence,
we identified several entities in the text—friend, car, and phone.
Relations
How are entities related? Natural language processing can identify whether there is a
relationship between multiple entities and tell the type of relation between them. For
example, a “createdBy” relation might connect the entities “iPhone” and “Apple.”
Concepts
One of the more magical aspects of NLP is extracting general concepts from the body
of text that may not explicitly appear in the corpus. This is a potent tool. For example,
analysis of an article about Tesla may return the concepts “electric cars“or “Elon
Musk,” even if those terms are not explicitly mentioned in the text.
Keywords
NLP can identify the important and relevant keywords in your content. This allows you
to create a base of words from the corpus that are important to the business value
you’re trying to drive.
Semantic Roles
Semantic roles are the subjects, actions, and the objects they act upon in the text.
Take the sentence, “IBM bought a company.” In this sentence the subject is “IBM,”
the action is “bought,” and the object is “company.” NLP can parse sentences into
these semantic roles for a variety of business uses—for example, determining which
companies were acquired last week or receiving notifications any time a particular
company launches a product.
Categories
Categories describe what a piece of content is about at a high level. NLP can analyze
text and then place it into a hierarchical taxonomy, providing categories to use in
applications. Depending on the content, categories could be one or more of sports,
finance, travel, computing, and so on. Possible applications include placing relevant
ads alongside user-generated content on a website or displaying all the articles talking
about a particular subject.
Emotion
Whether you’re trying to understand the emotion conveyed by a post on social media
or analyze incoming customer support tickets, detecting emotions in text is extremely
valuable. Is the content conveying anger, disgust, fear, joy, or sadness? Emotion
detection in NLP will assist in solving this problem.
Sentiment
Similarly, what is the general sentiment in the content? Is it positive, neutral, or
negative? NLP can provide a score as to the level of positive or negative sentiment of
the text. Again, this proves to be extremely valuable in the context of customer
support. This enables automatic understanding of sentiment related to your product
on a continual basis. Now that we’ve covered what constitutes natural language
processing, let’s look at some examples to illustrate how NLP is currently being used
across various industries.
Enterprise Applications of NLP
The following are some of the best representations of the power of NLP.
Natural Language Processing is everywhere and we use it in our daily lives without
even realizing it. Do you know how spam messages are separated from your emails?
Or autocorrect and predictive typing that saves so much of our time, how does that
happen? Well, it is all part of Natural Language Processing. Here are some examples
of Natural Language Processing technologies used widely:
Intelligent personal assistants – We are all familiar with Siri and Cortana. These
mobile software products that perform tasks, offer services, with a combination of
user input, location awareness, and the ability to access information from a variety of
online sources are undoubtedly one of the biggest achievements of natural language
processing.
Machine translation – To read a description of a beautiful picture on Instagram
or to read updates on Facebook, we all have used that ‘see translation’ command at
least once. And google translation services helps in urgent situations or sometimes
just to learn few new words. These are all examples of machine translations, where
machines provide us with translations from one natural language to another.
Speech recognition – Converting spoken words into data is an example of
natural language processing. It is used for multiple purposes like dictating to Microsoft
Word, voice biometrics, voice user interface, etc.
Affective computing – It is nothing but emotional intelligence training for
machines. They learn to understand your emotions, feelings, ideas to interact with
you in more humane ways.
Natural language generation – Natural language generation tools scan
structured data, undertake analysis and generate information in text format produced
in natural language.
Natural language understanding – As explained above, it scans content written
in natural languages and generates small, comprehensible summaries of text.
Best tools for Natural Language Understanding available today
Natural Language Processing deals with human language in its most natural form and
on a real-time basis, as it appears in social media content, emails, web pages, tweets,
product descriptions, newspaper articles, and scientific research papers, etc, in a
variety of languages. Businesses need to keep a tab on all this content, constantly.
Here are a few popular natural language understanding software products which
effectively aid them in this daunting task.
Wolfram – Wolfram Alpha is an answer engine developed by Wolfram Alpha
LLC (a subsidiary of Wolfram Research). It is an online service that provides answers to
factual questions by computing the answer from externally sourced, “curated data”.
Natural language toolkit – The Natural Language Toolkit, also known as NLTK, is
a suite of programs used for symbolic and statistical natural language processing
(NLP) for the English language. It is written in the Python programming language and
was developed by Steven Bird and Edward Loper at the University of Pennsylvania.
Stanford coreNLP – Stanford CoreNLP is an annotation-based NLP pipeline that
offers core natural language analysis. The basic distribution provides model files for
the analysis of English, but the engine is compatible with models for other languages.
GATE (General Architecture for Text Engineering) – It offers a wide range of
natural language processing tasks. It is a mature software used across industries for
more than 15 years.
Sentiment Analysis:
Sentiment analysis (also known as opinion mining or emotion AI) refers to the use
of natural language processing, text analysis, computational linguistics,
and biometrics to systematically identify, extract, quantify, and study affective states
and subjective information. Sentiment analysis is widely applied to voice of the
customer materials such as reviews and survey responses, online and social media,
and healthcare materials for applications that range from marketing to customer
service to clinical medicine.
Examples:
The Restaurant is great, Staff are really friendly and food is delicious---Positive
(Sentiment)
I would not recommend this restaurant to anyone, food is terrible and is really
expensiveNegative (sentiment)
More Challenging Examples:
The movie is surprising with plenty of unsettling plot twists. (Negative term used in a
positive sense in certain domains).
I love my mobile but would not recommend it to any of my colleagues. (Qualified
positive sentiment, difficult to categories).
Sentiment Analysis is the process of determining whether a piece of writing is
positive, negative or neutral. A sentiment analysis system for text analysis combines
natural language processing (NLP) and machine learning techniques to assign
weighted sentiment scores to the entities, topics, themes and categories within a
sentence or phrase.
Why is Sentiment Analysis needed?
In today’s environment where we’re suffering from data overload (although this does
not mean better or deeper insights), companies might have mountains of customer
feedback collected. Yet for mere humans, it’s still impossible to analyze it manually
without any sort of error or bias.
1. Break each text document down into its component parts (sentences, phrases,
tokens and parts of speech)
2. Identify each sentiment-bearing phrase and component
3. Assign a sentiment score to each phrase and component (-1 to +1)
4. Optional: Combine scores for multi-layered sentiment analysis
For Example
Terrible pitching and awful hitting led to another crushing loss.
Bad pitching and mediocre hitting cost us another close game.
Both sentences discuss a similar subject, the loss of a baseball game. But you, the
human reading them, can clearly see that first sentence’s tone is much more negative.
When you read the sentences above, your brain draws on your accumulated
knowledge to identify each sentiment-bearing phrase and interpret their negativity or
positivity. Usually this happens subconsciously. For example, you instinctively know
that a game that ends in a “crushing loss” has a higher score differential than the
“close game”, because you understand that “crushing” is a stronger adjective than
“close”.
Two basic techniques for sentiment analysis
1. Rule based sentiment analysis
Usually, a rule-based system uses a set of human-crafted rules to help identify
subjectivity, polarity, or the subject of an opinion.
Stemming, tokenization, part-of-speech tagging and parsing.
Lexicons (i.e. lists of words and expressions).
Here’s a basic example of how a rule-based system works:
Rule-based systems are very naive since they don't take into account how words are
combined in a sequence. Of course, more advanced processing techniques can be
used, and new rules added to support new expressions and vocabulary. However,
adding new rules may affect previous results, and the whole system can get very
complex. Since rule-based systems often require fine-tuning and maintenance, they’ll
also need regular investments.
-1 = Negative / +1 = Positive
Here, we train an ML model to recognize the sentiment based on the words and their
order using a sentiment-labelled training set. This approach depends largely on the
type of algorithm and the quality of the training data used.
Let’s look again at the stock trading example mentioned above. We take news
headlines, and narrow them to lines which mention the particular company that we
are interested in (often done by another NLP technique, called Named Entity
Recognition) and then gauge the polarity of the sentiment in the text.
One way to make this approach fit other types of problems is to measure polarity
across other dimensions. You could look at specific emotions. How angry was the
person when they were writing the text? How much fear is conveyed in the text?
Applications
Broadly speaking, sentiment analysis is most effective when used as a tool for Voice of
Customer and Voice of Employee. Business analysts, product managers, customer
support directors, human resources and workforce analysts, and other stakeholders
use sentiment analysis to understand how customers and employees feel about
particular subjects, and why they feel that way.
SEGMENTATION
Text segmentation is the process of dividing written text into meaningful units, such
as words, sentences, or topics. The term applies both to mental processes used by
humans when reading text, and to artificial processes implemented in computers,
which are the subject of natural language processing.
In any text document, there are particular terms that represent specific entities that
are more informative and have a unique context. These entities are known as named
entities, which more specifically refer to terms that represent real-world objects like
people, places, organizations, and so on, which are often denoted by proper names. A
naive approach could be to find these by looking at the noun phrases in text
documents. Named entity recognition (NER) , also known as entity
chunking/extraction , is a popular technique used in information extraction to identify
and segment the named entities and classify or categorize them under various
predefined classes.
Speech Fundamentals:
Developing and understanding Automatic Speech Recognition (ASR) systems is
an inter-disciplinary activity, taking expertise in linguistics, computer science,
mathematics, and electrical engineering.
When a human speaks a word, they cause their voice to make a time-varying
pattern of sounds. These sounds are waves of pressure that propagate through the
air. The sounds are captured by a sensor, such as a microphone or microphone array,
and turned into a sequence of numbers representing the pressure change over time.
The automatic speech recognition system converts this time-pressure signal into a
time-frequency-energy signal. It has been trained on a curated set of labeled speech
sounds, and labels the sounds it is presented with. These acoustic labels are combined
with a model of word pronunciation and a model of word sequences, to create a
textual representation of what was said.
The following definitions are the basics needed for understanding speech recognition
technology.
Utterance
Speaker Dependence
Vocabularies
Vocabularies (or dictionaries) are lists of words or utterances that can be
recognized by the SR system. Generally, smaller vocabularies are easier for a
computer to recognize, while larger vocabularies are more difficult. Unlike
normal dictionaries, each entry doesn't have to be a single word. They can be as
long as a sentence or two. Smaller vocabularies can have as few as 1 or 2
recognized utterances (e.g."Wake Up"), while very large vocabularies can have
a hundred thousand or more!
Accuract
Training
Some speech recognizers have the ability to adapt to a speaker. When the
system has this ability, it may allow training to take place. An ASR system is
trained by having the speaker repeat standard or common phrases and
adjusting its comparison algorithms to match that particular speaker. Training a
recognizer usually improves its accuracy.
Speech Analysis:
Speech analytics is the process of analyzing recorded calls to gather customer
information to improve communication and future interaction. The process is
primarily used by customer contact centers to extract information buried in client
interactions with an enterprise.
Speech Modelling:
Speech recognition works using algorithms through some model those are:
Acoustic Modeling
Language modeling
Hidden Markov Models
Acoustic Modeling:
The decoder then looks in the Grammar file for a matching word or phrase.
Since our grammar in this example only contains one word ("HOUSE"), it
returns the word "HOUSE" to the calling program.
Language modeling:
Statistical language models (SLMs) are good for free-form input, such as
dictation or spontaneous speech, where it's not practical or possible to a priori specify
all possible legal word sequences.
Trigram SLMs are probably the most common ones used in ASR and represent a
good balance between complexity and robust estimation. A trigram model encodes
the probability of a word (w3) given its immediate two-word history, ie p(w3 | w1
w2). In practice trigam models can be "backed-off" to bigram and unigram models,
allowing the decoder to emit any possible word sequence (provided that the acoustic
and lexical evidence is there).
Summary of contents
1. Speech Recognition Systems
Introduction
Early speech recognition systems tried to model the human articulatory
channel. They didn’t work. Since the 1970s, these systems have been trained
on example data rather than defined using rules. The transition was caused by
the success of the HEARSAY and HARPY systems at CMU.
Step 1: Speech
Speech is pressure waves, travelling through the air. Created by vibrations of
larynx, followed by openings or blockages en route to the outside. Vowels and
consonants.
1.The basic pressure wave is full of noise and very context-sensitive, so it is very
difficult to work with. First, perform Fourier transform, to represent the wave
as a sum of waves at a range of frequencies, within a certain window. This is
shown in the speech spectrogram on the previous page. Now work in the
Frequency domain (on the y axis), not the waveform domain. Try various
windows to minimize edge effects, etc.
2.Decompose (deconvolve) the Fourier-transformed waves into a set of vectors,
by cepstral analysis. Chop up the timeline (x-axis) and the frequency space (y-
axis) to obtain ‘little squares’, from which you obtain the quantized vectors.
Now certain operations become simpler (like working in log space; can add
instead of multiply), though some new steps become necessary. Move a
window over the Fourier transform and measure the strengths of the voice
natural frequencies f0, f1, f2…
/b/0.3 /a/0.5
1 2 //0.4 3
//0.1
/t/0.6 /t/0.1
ASR system:
2.Evaluation
Measures
Principal measure is Word Error rate (WER): measure how many words
were recognized correctly in known test sample.
WER = (S + I + D) * 100 / N
3.Speech translation
The goal: a translating telephone.
Research projects at CMU, Karlsruhe, ISI, ARL, etc.
The Verbmobil project in Germany translated between German and
French using English as interlingua. A large multi-site consortium, it kept
NLP funded in Germany for almost a decade, starting mid-1990s.
One commercial product (PC based): NEC, for $300, in 2002. They now sell
a PDA based version. 50,000 words, bigrams, some parsing and a little
semantic transfer.
4.Prosody
An increasingly interesting topic today is the recognition of emotion and
other pragmatic signals in addition to the words. Human-human speech
is foundationally mediated by prosody prosody (rhythm, intonation, etc.,
of speech). Speech is only natural when it is not ‘flat’: we infer a great
deal about speaker’s inner state and goals from prosody.
Prosody is characterized by two attributes:
Prominence: Intonation, rhythm, and lexical stress patterns, which
signal emphasis, intent, emotion
Phrasing: Chunking of utterances into prosodic phrases, which
assists with correct interpretation The sentence “he leaves
tomorrow”, said four ways (statement, question, command,
sarcastically):
To handle prosody, you need to develop:
Suitable representation of prosody
Algorithms to automatically detect prosody
Methods to integrate these detectors in speech applications
To represent prosody, you extract features from the pitch contours of last
200 msec of utterances and then convert the parameters into a
discretized (categorical) notation.
5.Current status
Applications:
1.General-purpose dictation: Several commercial systems for $100:
DragonDictate (used to be Dragon Systems; by Jim Baker); now at
Nuance (www.nuance.com)
IBM ViaVoice (from Jim Baker at IBM)
Whatever was left when Lernout and Hauspie (was Kurzweil) went
bankrupt
Kai-Fu Lee takes SPHINX from CMU to Apple (PlainTalk) and then to
Microsoft Beijing
Windows Speech Recognition in Windows Vista
2.Military: Handheld devices for speech-to-speech translation in Iraq
and elsewhere. Also used in fighter planes where the pilot’s hands are
too busy to type.
3.Healthcare: ASR for doctors in order to create patient records
automatically.
4.Autos: Speech devices take driver input and display routes, maps, etc.
5.Help for disabled (esp to access the web and control the computer).
Speech Synthesis:
Traditional model
Enhance: sentence prosody contour. Also need Speech Act and focus/stress as
input
Concatenative synthesis
Record speaker many times; create lexicon of sounds for letters, in word
start/middle/end, sentence start/middle/end, stress/unstress, etc. forms
Optional Readings
Text-to-Speech:
Text-to-speech (TTS) is a type of assistive technology that reads digital text
aloud. It’s sometimes called “read aloud” technology.
With a click of a button or the touch of a finger, TTS can take words on a
computer or other digital device and convert them into audio. TTS is very
helpful for kids who struggle with reading. But it can also help kids with writing
and editing, and even focusing.
The voice in TTS is computer-generated, and reading speed can usually be sped
up or slowed down. Voice quality varies, but some voices sound human. There
are even computer-generated voices that sound like children speaking.
Many TTS tools highlight words as they are read aloud. This allows kids to see
text and hear it at the same time.
Did you know that your child may be eligible for free digital text-to-speech
books?
And since TTS lets kids both see and hear text when reading, it creates
a multisensory reading experience. Researchers have found that the
combination of seeing and hearing text when reading:
Depending on the device your child uses, there are many different TTS tools:
It’s a good idea to start the conversation with your child’s teacher if you think
your child would benefit from TTS. If your child has an Individualized Education
Program (IEP) or a 504 plan, your child has a right to the assistive technology
she needs to learn. But even without an IEP or a 504 plan, a school may be
willing to provide TTS if it can help your child.
Chapter 5
Introduction to Image Processing & Computer Vision
The full form of the pixel is "Picture Element." It is also known as "PEL." Pixel is
the smallest element of an image on a computer display, whether they are LCD
or CRT monitors. A screen is made up of a matrix of thousands or millions of
pixels. A pixel is represented with a dot or a square on a computer screen.
The above figure is an example of digital image that you are now viewing on
your computer screen. But actually , this image is nothing but a two
dimensional array of numbers ranging between 0 and 255.
The analog image processing is applied on analog signals and it processes only
two-dimensional signals. The images are manipulated by electrical signals. In
analog image processing, analog signals can be periodic or non-periodic.
Examples of analog images are television images, photographs, paintings, and
medical images etc.
There are following differences between Analog Images Processing and Digital
Image Processing:
Overlapping fields:
Machine/Computer vision
Computer graphics
Computer graphics deals with the formation of images from object models,
rather then the image is captured by some device. For example: Object
rendering. Generating an image from an object model. Such a system would
look something like this.
Artificial intelligence
Artificial intelligence is more or less the study of putting human intelligence
into machines. Artificial intelligence has many applications in image
processing. For example: developing computer aided diagnosis systems that
help doctors in interpreting images of X-ray , MRI e.t.c and then highlighting
conspicuous section to be examined by the doctor.
Image Noise:
Noise represents unwanted information which deteriorates image quality.
Noise is a random variation of image intensity and visible as grains in the
image. Noise means, pixels within the picture present different intensity values
rather than correct pixel values. Noise originates from the physical nature of
detection processes and has many specific forms and causes, Noise is defined
as a process (n) which affects the acquired image (f) and is not part of the
scene (initial signal-s), and so the noise model can be written as f(i, j) = s( i, j) +
n(i, j). Digital image noise may come from various sources. The acquisition
process for digital images converts optical signals into electrical signals and
then into digital signals and is one processes by which the noise is introduced
in digital images.
TYPES OF NOISE:
During image acquisition or transmission, several factors are responsible for
introducing noise in the image. Depending on the types of disturbance, the
noise can affect the image to different extent. Our main concern is to remove
certain kind of noise. So we have to first identify certain type of noise and
apply different algorithms to remove the noise. The common types of are:
1. Salt Pepper Noise
2. Poisson Noise
3. Gaussian Noise
4. Speckle Noise
2. Poisson Noise:
Poisson or shot photon noise is the noise that is caused when number
of photons sensed by the senor is not sufficient to provide detectable
statistical information. Shot noise exists because a phenomenon such as
light and electric current consists of the movement of discrete packets. Shot
noise may be dominated when the finite number of particles that carry
energy is sufficiently small so that uncertainties due to the Poisson
distribution, which describe the occurrence of independent random events,
are of significance. Magnitude of this noise increase with the average
magnitude of the current or intensity of the light.
3. Gaussian Noise:
Gaussian noise is evenly distributed over signal. This means that each
pixel in the noisy image is the sum of the true pixel value and a random
Gaussian distributed noise value. The noise is independent of intensity
of pixel value at each point. A special case is white Gaussian noise, in
which the values at any pair of times are identically distributed and
statistically independent. White noise draws its name from white light.
Principal sources of Gaussian noise in digital images arise during
acquisition, for example sensor noise caused by poor illumination or high
temperature or transmission.
4. Speckle Noise:
Speckle noise is multiplicative noise unlike the Gaussian and salt pepper
noise. This noise can be modeled by random vale multiplications with
pixel values of the image and can be expressed as
P = I + n * I Where P is the speckle noise distribution image, I is the input
image and n is the uniform noise image by mean o and variance v.
Speckle noise is commonly observed in radar sensing system, although it
may appear in any type of remotely sensed image utilizing coherent
radiation. Like the light from a laser, the waves emitted by active sensors
travel in phase and interact minimally on their way to the target area.
Reducing the effect of speckle noise permits both better discrimination
of scene targets and easier automatic image segmentation.
Linear Filtering: Linear filters are used to remove certain types of noise. These
filters remove noise by convolving the original image with a mask that
represents a low-pass filter or smoothing operation. The output of a linear
operation due to the sum of two inputs is the same as performing the
operation on the inputs individually and then summing the results. These
filters also tend to blur the sharp edges, destroy the lines and other fine details
of the image. Linear methods are fast but they do not preserve the details of
the image.
Non-Linear Filtering: Non- linear filter is a filter whose output is not a linear
function of its inputs. Non-linear filters preserve the details of the image. Non-
linear filters have many applications, especially removal of certain types of
noise that are not additive. Non-linear filters are considerably harder to use
and design than linear ones.
Color Enhancement:
Image enhancement is the process of adjusting digital images so that the
results are more suitable for display or further image analysis. For example,
you can remove noise, sharpen, or brighten an image, making it easier
to identify key features.
Segmentation:
In digital image processing and computer vision, image segmentation is the
process of partitioning a digital image into multiple segments (sets of pixels,
also known as image objects). The goal of segmentation is to simplify and/or
change the representation of an image into something that is more meaningful
and easier to analyze.[1][2] Image segmentation is typically used to locate
objects and boundaries (lines, curves, etc.) in images. More precisely, image
segmentation is the process of assigning a label to every pixel in an image such
that pixels with the same label share certain characteristics.
The result of image segmentation is a set of segments that collectively cover
the entire image, or a set of contours extracted from the image (see edge
detection). Each of the pixels in a region are similar with respect to some
characteristic or computed property, such as color, intensity, or texture.
Adjacent regions are significantly different with respect to the same
characteristic(s).[1] When applied to a stack of images, typical in medical
imaging, the resulting contours after image segmentation can be used to
create 3D reconstructions with the help of interpolation algorithms
like marching cubes.
Some of the practical applications of image segmentation are:
Autonomous Driving
Edge Detection:
Edge detection includes a variety of mathematical methods that aim at
identifying points in a digital image at which the image brightness changes
sharply or, more formally, has discontinuities. The points at which image
brightness changes sharply are typically organized into a set of curved line
segments termed edges. The same problem of finding discontinuities in one-
dimensional signals is known as step detection and the problem of finding
signal discontinuities over time is known as change detection. Edge detection is
a fundamental tool in image processing, machine vision and computer vision,
particularly in the areas of feature detection and feature extraction.
Motivation:
The purpose of detecting sharp changes in image brightness is to capture
important events and changes in properties of the world. It can be shown that
under rather general assumptions for an image formation model,
discontinuities in image brightness are likely to correspond to:[2][3]
discontinuities in depth,
discontinuities in surface orientation,
changes in material properties and
variations in scene illumination.
In the ideal case, the result of applying an edge detector to an image may lead
to a set of connected curves that indicate the boundaries of objects, the
boundaries of surface markings as well as curves that correspond to
discontinuities in surface orientation. Thus, applying an edge detection
algorithm to an image may significantly reduce the amount of data to be
processed and may therefore filter out information that may be regarded as
less relevant, while preserving the important structural properties of an image.
If the edge detection step is successful, the subsequent task of interpreting the
information contents in the original image may therefore be substantially
simplified. However, it is not always possible to obtain such ideal edges from
real life images of moderate complexity.
Edges extracted from non-trivial images are often hampered by fragmentation,
meaning that the edge curves are not connected, missing edge segments as
well as false edges not corresponding to interesting phenomena in the image –
thus complicating the subsequent task of interpreting the image data.[4]
Edge detection is one of the fundamental steps in image processing, image
analysis, image pattern recognition, and computer vision techniques.
Feature Detection:
In computer vision and image processing feature detection includes methods
for computing abstractions of image information and making local decisions at
every image point whether there is an image feature of a given type at that
point or not. The resulting features will be subsets of the image domain, often
in the form of isolated points, continuous curves or connected regions.
Definition of Feature:
a feature is typically defined as an "interesting" part of an image, and features
are used as a starting point for many computer vision algorithms.
Recognition:
Image recognition, in the context of machine vision, is the ability of software
to identify objects, places, people, writing and actions in images. Computers
can use machine vision technologies in combination with a camera
and artificial intelligence software to achieve image recognition.
While human and animal brains recognize objects with ease, computers have
difficulty with the task. Software for image recognition requires deep machine
learning. Performance is best on convolutional neural net processors as the
specific task otherwise requires massive amounts of power for its compute-
intensive nature. Image recognition algorithms can function by use of
comparative 3D models, appearances from different angles using edge
detection or by components. Image recognition algorithms are often trained
on millions of pre-labeled pictures with guided computer learning.