0% found this document useful (0 votes)
485 views

Module1 CourseMaterials

1) Artificial intelligence is an approach to develop computer systems that can perform tasks that normally require human intelligence, such as visual perception, speech recognition, and decision-making. 2) The document defines AI and its objectives which include reasoning, knowledge representation, planning, learning, natural language processing and the ability to manipulate objects. 3) The advantages of AI discussed include reducing human error, taking risks instead of humans, being available 24/7, helping with repetitive jobs, providing digital assistance, making faster decisions, and powering daily applications and new inventions. The disadvantages center around the high costs, potential to make humans lazy, risk of unemployment, lack of emotions, and inability for out of the box thinking.

Uploaded by

max mind
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
485 views

Module1 CourseMaterials

1) Artificial intelligence is an approach to develop computer systems that can perform tasks that normally require human intelligence, such as visual perception, speech recognition, and decision-making. 2) The document defines AI and its objectives which include reasoning, knowledge representation, planning, learning, natural language processing and the ability to manipulate objects. 3) The advantages of AI discussed include reducing human error, taking risks instead of humans, being available 24/7, helping with repetitive jobs, providing digital assistance, making faster decisions, and powering daily applications and new inventions. The disadvantages center around the high costs, potential to make humans lazy, risk of unemployment, lack of emotions, and inability for out of the box thinking.

Uploaded by

max mind
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 126

UNIT I

Chapter 1
Introduction to Artificial Intelligence
Artificial Intelligence is an approach to make a computer, a robot, or a product to
think how smart humans think. AI is a study of how the human brain thinks, learns,
decides and works, when it tries to solve problems. And finally this study outputs
intelligent software systems. The aim of AI is to improve computer functions which
are related to human knowledge, for example, reasoning, learning, and problem-
solving.

John McCarthy developed AI

The intelligence is intangible. It is composed of

● Reasoning
● Learning
● Problem Solving
● Perception
● Linguistic Intelligence

The objectives of AI research are reasoning, knowledge representation,


planning, learning, natural language processing, realization, and ability to move and
manipulate objects. There are long-term goals in the general intelligence sector.
Definition of AI: Definition of AI varies from person to person. Few definitions are
defined below :
By IBM developer

AI is about imparting the ability to think and learn on the machines.

By Steven waslander, Ph.D

Associate Professor, University of Toronto

AI is a set of Technologies that allows us to extract knowledge from data.

Definitions of AI organized into four categories:

Definitions of AI

“The exciting new effort to make “The study of Mental faculties through
computers think...machines with minds, the use of computational models”
in the full and literal sense”(Haugeland,
(charniak and MCDermott , 1985)
1985)
“ The study of computations that make it
“[The automation of]activities that we
possible to perceive, reason, and act
associate with human thinking, activities
act” (Winston, 1922)
such as decision-making, problem solving,
learning…”(Bellman, 1978)

“The art of creating machines that “A Field of study that seeks to explain and
perform functions that require emulate intelligent behaviour in terms of
intelligence when performed by people” computational processes” (Schalkoff,
(Kurzw eil, 1990) 1990)
“The study of how to make computers “ The branch of computer science that is
do things at which, at the moment. concerned with the automation of
people are better”(Rich and Knight, 1991) intelligent behaviour”(Luger and
Stubblefield, 1993)

Some definitions of AI. they are organised into four categories as below:
Systems that think like humans. Systems that think rationally.

Systems that act like humans. Systems that act rationally.

AI is a multidisciplinary field

Computer
science

Mathematics
&
Statistics <>
Electrical Software
Engineering &Hardware

1. AI is the fusion of many fields of study.


2. AI modeled after human brain psychology and linguistics, philosophy
Psychology and Linguistics - explain the central role of how AI
works.

Philosophy - provides guidance on Intelligence and ethical


considerations.

Advantages of AI:
Few advantages of AI are mentioned below:

1) Reduction in Human Error:


The phrase “human error” was born because humans make mistakes from time
to time. Computers, however, do not make these mistakes if they are programmed
properly. With Artificial intelligence, the decisions are taken from the previously
gathered information applying a certain set of algorithms. So errors are reduced and
the chance of reaching accuracy with a greater degree of precision is a possibility.
Example: In Weather Forecasting using AI they have reduced the majority of
human error.
2) Takes risks instead of Humans:
This is one of the biggest advantages of Artificial intelligence. We can overcome
many risky limitations of humans by developing an AI Robot which in turn can do the
risky things for us. Let it be going to mars, defuse a bomb, explore the deepest parts
of oceans, mining for coal and oil, it can be used effectively in any kind of natural or
man-made disasters.
Example: Have you heard about the Chernobyl nuclear power plant explosion
in Ukraine? At that time there were no AI-powered robots that can help us to
minimize the effect of radiation by controlling the fire in early stages, as any human
went close to the core and was dead in a matter of minutes. They eventually poured
sand and boron from helicopters from a mere distance.
AI Robots can be used in such situations where intervention can be hazardous.
3) Available 24x7:
An Average human will work for 4–6 hours a day excluding the breaks. Humans
are built in such a way to get some time out for refreshing themselves and get ready
for a new day of work and they even have weekly offers to stay intact with their work-
life and personal life. But using AI we can make machines work 24x7 without any
breaks and they don’t even get bored, unlike humans.
Example: Educational Institutes and Helpline centers are getting many queries
and issues which can be handled effectively using AI.
4) Helping in Repetitive Jobs:
In our day-to-day work, we will be performing many repetitive works like
sending a thank you mail, verifying certain documents for errors and many more
things. Using artificial intelligence we can productively automate these mundane
tasks and can even remove “boring” tasks for humans and free them up to be
increasingly creative.
Example: In banks, we often see many verifications of documents to get a loan
which is a repetitive task for the owner of the bank. Using AI Cognitive Automation
the owner can speed up the process of verifying the documents by which both the
customers and the owner will be benefited.
5) Digital Assistance:
Some of the highly advanced organizations use digital assistants to interact with
users which saves the need for human resources. The digital assistants are also used
in many websites to provide things that users want. We can chat with them about
what we are looking for. Some chatbots are designed in such a way that it’s become
hard to determine that we’re chatting with a chatbot or a human being.
Example: We all know that organizations have a customer support team that
needs to clarify the doubts and queries of the customers. Using AI the organizations
can set up a Voice bot or Chatbot which can help customers with all their queries. We
can see many organizations already started using them on their websites and mobile
applications.
6) Faster Decisions:
Using AI alongside other technologies we can make machines take decisions
faster than a human and carry out actions quicker. While taking a decision humans
will analyze many factors both emotionally and practically but an AI-powered
machine works on what it is programmed and delivers the results in a faster way.
Example: We all have played Chess games in Windows. It is nearly impossible to
beat the CPU in the hard mode because of the AI behind that game. It will take the
best possible step in a very short time according to the algorithms used behind it.
7) Daily Applications:
Daily applications such as Apple’s Siri, Windows Cortana, Google’s OK Google
are frequently used in our daily routine whether it is for searching a location, taking a
selfie, making a phone call, replying to a mail and many more.
Example: Around 20 years ago, when we are planning to go somewhere we
used to ask a person who already went there for the directions. But now all we have
to do is say “OK Google where Visakhapatnam is”. It will show you Visakhapatnam’s
location on google map and the best path between you and Visakhapatnam.
8) New Inventions:
AI is powering many inventions in almost every domain which will help humans
solve the majority of complex problems.
Example: Recently doctors can predict breast cancer in the woman at earlier
stages using advanced AI-based technologies.

Disadvantages of AI:
Few disadvantages of AI are:

1) High Costs of Creation:

As AI is updating every day the hardware and software need to get updated
with time to meet the latest requirements. Machines need repairing and maintenance
which need plenty of costs. Its creation requires huge costs as they are very complex
machines.

2) Making Humans Lazy:

AI is making humans lazy with its applications automating the majority of the
work. Humans tend to get addicted to these inventions which can cause a problem to
future generations.

3) Unemployment:

As AI is replacing the majority of the repetitive tasks and other work with
robots,human interference is becoming less which will cause a major problem in the
employment standards. Every organization is looking to replace the minimum
qualified individuals with AI robots which can do similar work with more efficiency.

4) No Emotions:

There is no doubt that machines are much better when it comes to working
efficiently but they cannot replace the human connection that makes the team.
Machines cannot develop a bond with humans which is an essential attribute when it
comes to Team Management.

5) Lacking Out of Box Thinking:

Machines can perform only those tasks which they are designed or
programmed to do, anything out of that they tend to crash or give irrelevant outputs
which could be a major backdrop.
Applications of AI:
Artificial intelligence has dramatically changed the business landscape. What
started as a rule-based automation is now capable of mimicking human interaction. It
is not just the human-like capabilities that make artificial intelligence unique. An
advanced AI algorithm offers far better speed and reliability at a much lower cost as
compared to its human counterparts.AI has impacted various fields like marketing,
finance, and banking and so on. The various Application domains of AI are:

1. AI In Marketing
2. AI In Banking
3. AI In Finance
4. AI In Agriculture
5. AI In HealthCare
6. AI In Gaming
7. AI In Space Exploration
8. AI In Autonomous Vehicles
9. AI In Chatbots
10.AI In Artificial Creativity

Marketing
Marketing is a way to sugar coat your products to attract more customers. We,
humans, are pretty good at sugar coating, but what if an algorithm or a bot is built
solely for the purpose of marketing a brand or a company? It would do a pretty
awesome job!

In the early 2000s, if we searched an online store to find a product without knowing
it’s exact name, it would become a nightmare to find the product. But now when we
search for an item on any e-commerce store, we get all possible results related to the
item. It’s like these search engines read our minds! In a matter of seconds, we get a
list of all relevant items. An example of this is finding the right movies on Netflix.
Artificial Intelligence Applications – AI in Marketing

One reason why we’re all obsessed with Netflix and chill is because Netflix
provides highly accurate predictive technology based on customer’s reactions to films.
It examines millions of records to suggest shows and films that you might like based
on your previous actions and choices of films. As the data set grows, this technology is
getting smarter and smarter every day.

With the growing advancement in AI, in the near future, it may be possible for
consumers on the web to buy products by snapping a photo of it. Companies like
CamFind and their competitors are experimenting this already.

Banking
AI in banking is growing faster than you thought! A lot of banks have already
adopted AI-based systems to provide customer support, detect anomalies and credit
card frauds. An example of this is HDFC Bank.

HDFC Bank has developed an AI-based chatbot called EVA (Electronic Virtual
Assistant), built by Bengaluru-based Senseforth AI Research.

Since its launch, Eva has addressed over 3 million customer queries, interacted with
over half a million unique users, and held over a million conversations. Eva can collect
knowledge from thousands of sources and provide simple answers in less than 0.4
seconds.
Artificial Intelligence Applications – AI in Banking

The use of AI for fraud prevention is not a new concept. In fact, AI solutions can
be used to enhance security across a number of business sectors, including retail and
finance.

By tracing card usage and endpoint access, security specialists are more
effectively preventing fraud. Organizations rely on AI to trace those steps by analyzing
the behaviors of transactions.

Companies such as MasterCard and RBS WorldPay have relied on AI and Deep
Learning to detect fraudulent transaction patterns and prevent card fraud for years
now. This has saved millions of dollars.

Finance
Ventures have been relying on computers and data scientists to determine
future patterns in the market. Trading mainly depends on the ability to predict the
future accurately.

Machines are great at this because they can crunch a huge amount of data in a
short span. Machines can also learn to observe patterns in past data and predict how
these patterns might repeat in the future.

In the age of ultra-high-frequency trading, financial organizations are turning to


AI to improve their stock trading performance and boost profit.
Artificial Intelligence Applications – AI in Finance

One such organization is Japan’s leading brokerage house, Nomura Securities.


The company has been reluctantly pursuing one goal, i.e. to analyze the insights of
experienced stock traders with the help of computers. After years of research,
Nomura is set to introduce a new stock trading system.

The new system stores a vast amount of price and trading data in its computer.
By tapping into this reservoir of information, it will make assessments, for example, it
may determine that current market conditions are similar to the conditions two
weeks ago and predict how share prices will be changing a few minutes down the line.
This will help to take better trading decisions based on the predicted market prices.

Agriculture
Here’s an alarming fact, the world will need to produce 50 percent more food
by 2050 because we’re literally eating up everything! The only way this can be
possible is if we use our resources more carefully. With that being said, AI can help
farmers get more from the land while using resources more sustainably.

Issues such as climate change, population growth, and food security concerns
have pushed the industry into seeking more innovative approaches to improve crop
yield.

Organizations are using automation and robotics to help farmers find more
efficient ways to protect their crops from weeds.
Artificial Intelligence Applications – AI in Agriculture

Blue River Technology has developed a robot called See & Spray which uses
computer vision technologies like object detection to monitor and precisely spray
weedicide on cotton plants. Precision spraying can help prevent herbicide resistance.

Apart from this, Berlin-based agricultural tech start-up called PEAT, has
developed an application called Plantix that identifies potential defects and nutrient
deficiencies in the soil through images.

The image recognition app identifies possible defects through images captured
by the user’s smartphone camera. Users are then provided with soil restoration
techniques, tips, and other possible solutions. The company claims that its software
can achieve pattern detection with an estimated accuracy of up to 95%.

HealthCare
When it comes to saving our lives, a lot of organizations and medical care
centers are relying on AI. There are many examples of how AI in healthcare has
helped patients all over the world.

An organization called Cambio HealthCare developed a clinical decision support


system for stroke prevention that can give the physician a warning when there’s a
patient at risk of having a heart stroke.
Artificial Intelligence Applications – AI in Healthcare

Another such example is Coala life which is a company that has a digitalized
device that can find cardiac diseases.

Similarly, Aifloo is developing a system for keeping track of how people are
doing in nursing homes, home care, etc. The best thing about AI in healthcare is that
you don’t even need to develop a new medication. Just by using an existing
medication in the right way, you can also save lives.

Gaming
Over the past few years, Artificial Intelligence has become an integral part of
the gaming industry. In fact, one of the biggest accomplishments of AI is in the gaming
industry.

DeepMind’s AI-based AlphaGo software, which is known for defeating Lee


Sedol, the world champion in the game of GO, is considered to be one of the most
significant accomplishments in the field of AI.

Shortly after the victory, DeepMind created an advanced version of AlphaGo


called AlphaGo Zero which defeated the predecessor in an AI-AI face off. Unlike the
original AlphaGo, which DeepMind trained over time by using a large amount of data
and supervision, the advanced system, AlphaGo Zero taught itself to master the
game.

Other examples of Artificial Intelligence in gaming include the First Encounter


Assault Recon, popularly known as F.E.A.R, which is a first-person shooter video
game.
But what makes this game so special?

The actions taken by the opponent AI are unpredictable because the game is
designed in such a way that the opponents are trained throughout the game and
never repeat the same mistakes. They get better as the game gets harder. This makes
the game very challenging and prompts the players to constantly switch strategies
and never sit in the same position.

Space Exploration
Space expeditions and discoveries always require analyzing vast amounts of data.
Artificial Intelligence and Machine learning is the best way to handle and process data
on this scale. After rigorous research, astronomers used Artificial Intelligence to sift
through years of data obtained by the Kepler telescope in order to identify a distant
eight-planet solar system.
Artificial Intelligence is also being used for NASA’s next rover mission to Mars,
the Mars 2020 Rover. The AEGIS, which is an AI-based Mars rover is already on the
red planet. The rover is responsible for autonomous targeting of cameras in order to
perform investigations on Mars.

Autonomous Vehicles
For the longest time, self-driving cars have been a buzzword in the AI industry.
The developments of autonomous vehicles will definitely revolutionaries the
transport system.

Companies like Waymo conducted several test drives in Phoenix before


deploying their first AI-based public ride-hailing service. The AI system collects data
from the vehicles radar, cameras, GPS, and cloud services to produce control signals
that operate the vehicle.

Advanced Deep Learning algorithms can accurately predict what objects in the
vehicle’s vicinity are likely to do. This makes Waymo cars more effective and safer.

Another famous example of an autonomous vehicle is Tesla’s self-driving car.


Artificial Intelligence implements computer vision, image detection and deep learning
to build cars that can automatically detect objects and drive around without human
intervention.

Elon Musk talks a ton about how AI is implemented in tesla’s self-driving cars
and autopilot features. He quoted that,

“Tesla will have fully self-driving cars ready by the end of the year and a “robotaxi”
version – one that can ferry passengers without anyone behind the wheel – ready
for the streets next year”.
Chatbots
These days Virtual assistants have become a very common technology. Almost
every household has a virtual assistant that controls the appliances at home. A few
examples include Siri, Cortana, which are gaining popularity because of the user
experience they provide.

Amazon’s Echo is an example of how Artificial Intelligence can be used to


translate human language into desirable actions. This device uses speech recognition
and NLP to perform a wide range of tasks on your command. It can do more than just
play your favorite songs. It can be used to control the devices at your house, book
cabs, make phone calls, order your favorite food, check the weather conditions and so
on.

Another example is the newly released Google’s virtual assistant called Google
Duplex, that has astonished millions of people. Not only can it respond to calls and
book appointments for you, but it also adds a human touch.
The device uses Natural language processing and machine learning algorithms
to process human language and perform tasks such as manage your schedule, control
your smart home, make a reservation and so on.

Social Media
Ever since social media has become our identity, we’ve been generating an
immeasurable amount of data through chats, tweets, posts and so on. And wherever
there is an abundance of data, AI and Machine Learning are always involved.

In social media platforms like Facebook, AI is used for face verification wherein
machine learning and deep learning concepts are used to detect facial features and
tag your friends. Deep Learning is used to extract every minute detail from an image
by using a bunch of deep neural networks. On the other hand, Machine learning
algorithms are used to design your feed based on your interests.
Another such example is Twitter’s AI, which is being used to identify hate
speech and terroristic language in tweets. It makes use of Machine Learning, Deep
Learning, and Natural language processing to filter out offensive content. The
company discovered and banned 300,000 terrorist-linked accounts, 95% of which
were found by non-human, artificially intelligent machines.

Artificial Creativity
Have you ever wondered what would happen if an artificially intelligent
machine tried to create music and art?

An AI-based system called MuseNet can now compose classical music that
echoes the classical legends, Bach and Mozart.

MuseNet is a deep neural network that is capable of generating 4-minute


musical compositions with 10 different instruments and can combine styles from
country to Mozart to the Beatles.

MuseNet was not explicitly programmed with an understanding of music, but


instead discovered patterns of harmony, rhythm, and style by learning on its own.
Another creative product of Artificial Intelligence is a content automation tool
called Wordsmith. Wordsmith is a natural language generation platform that can
transform your data into insightful narratives.

Tech giants such as Yahoo, Microsoft, Tableau, are using WordSmith to generate
around 1.5 billion pieces of content every year.

Advanced search:
What is a Search Algorithm in AI?

Before we get started let's define what search in AI means.

A search algorithm is not the same thing as a search engine.


Search in AI is the process of navigating from a starting state to a goal state by
transitioning through intermediate states Almost any AI problem can be defined in
these terms.

State — A potential outcome of a problem

Transition — The act of moving between states.

Starting State — Where to start searching from.

Intermediate State- The states between the starting state and the goal state
that we need to transition to.

Goal State — The state to stop searching.

Search Space — A collection of states.

Path Cost: It is a function which assigns a numeric cost to each path.

Optimal Solution: If a solution has the lowest cost among all solutions.

Search tree: A tree representation of search problems is called Search tree. The
root of the search tree is the root node which is corresponding to the initial
state.

The Solution to a search problem is a sequence of actions, called the plan that
transforms the start state to the goal state.

This plan is achieved through search algorithms.

Properties of Search Algorithms:

Following are the four essential properties of search algorithms to compare the
efficiency of these algorithms:
Completeness: A search algorithm is said to be complete if it guarantees to return a
solution if at least any solution exists for any random input.
Optimality: If a solution found for an algorithm is guaranteed to be the best solution
(lowest path cost) among all other solutions, then such a solution is said to be an
optimal solution.
Time Complexity: Time complexity is a measure of time for an algorithm to complete
its task.
Space Complexity: It is the maximum storage space required at any point during the
search, as the complexity of the problem.

Types of search algorithms

Based on the search problems we can classify the search algorithms into
uninformed (Blind search) search and informed search (Heuristic search) algorithms.

Search Algorithms

Uninformed Search Informed Search

A* Serch
Depth First Search

Breadth First Search Best First Search

Uniform Cost Search

Uninformed Search:
Uninformed search is used when there is no information about the cost of navigating
between states.s
The uninformed search does not contain any domain knowledge such as closeness,
the location of the goal. It operates in a brute-force way as it only includes
information about how to traverse the tree and how to identify leaf and goal nodes.
Uninformed search applies a way in which search trees are searched without any
information about the search space like initial state operators and test for the goal, so
it is also called blind search. It examines each node of the tree until it achieves the
goal node.

Informed Search:

An informed search is used when we know the cost or have a solid estimate of the
cost between states.
Informed search algorithms use domain knowledge. In an informed search,
problem information is available which can guide the search. Informed search
strategies can find a solution more efficiently than an uninformed search strategy.
Informed search is also called a Heuristic search.
A heuristic is a way which might not always be guaranteed for best solutions but
guaranteed to find a good solution in a reasonable time.
Uninformed Search Algorithms
Uninformed search is a class of general-purpose search algorithms which
operates in brute force-way. Uninformed search algorithms do not have additional
information about state or search space other than how to traverse the tree, so it is
also called blind search.
Breadth-first Search:

● Breadth-first search is the most common search strategy for traversing a tree or
graph. This algorithm searches breadthwise in a tree or graph, so it is called
breadth-first search.
● BFS algorithm starts searching from the root node of the tree and expands all
successor nodes at the current level before moving to nodes of next level.
● The breadth-first search algorithm is an example of a general-graph search
algorithm.
● Breadth-first search implemented using FIFO queue data structure.
Advantages:

● BFS will provide a solution if any solution exists.


● If there are more than one solutions for a given problem, then BFS will provide
the minimal solution which requires the least number of steps.
Disadvantages:

● It requires lots of memory since each level of the tree must be saved into
memory to expand the next level.
● BFS needs lots of time if the solution is far away from the root node.
Example:
In the below tree structure, we have shown the traversing of the tree using BFS
algorithm from the root node S to goal node K. BFS search algorithm traverse in
layers, so it will follow the path which is shown by the dotted arrow, and the
traversed path will be:
1. S---> A--->B---->C--->D---->G--->H--->E---->F---->I---->K

Time Complexity: Time Complexity of BFS algorithm can be obtained by the number
of nodes traversed in BFS until the shallowest Node. Where the d= depth of
shallowest solution and b is a node at every state.
T (b) = 1+b2+b3+.......+ bd= O (bd)
Space Complexity: Space complexity of BFS algorithm is given by the Memory size of
the frontier which is O(bd).
Completeness: BFS is complete, which means if the shallowest goal node is at some
finite depth, then BFS will find a solution.
Optimality: BFS is optimal if path cost is a non-decreasing function of the depth of the
node.

Depth-first Search
● Depth-first search is a recursive algorithm for traversing a tree or graph data
structure.
● It is called the depth-first search because it starts from the root node and
follows each path to its greatest depth node before moving to the next path.
● DFS uses a stack data structure for its implementation.
● The process of the DFS algorithm is similar to the BFS algorithm.
Note: Backtracking is an algorithm technique for finding all possible solutions using
recursion.
Advantages:

● DFS requires very less memory as it only needs to store a stack of the nodes on
the path from root node to the current node.
● It takes less time to reach the goal node than the BFS algorithm (if it traverses
in the right path).
Disadvantages:

● There is the possibility that many states keep reoccurring, and there is no
guarantee of finding the solution.
● DFS algorithm goes for deep down searching and sometimes it may go to the
infinite loop.
Example:
In the below search tree, we have shown the flow of depth-first search, and it will
follow the order as:
Root node--->Left node ----> right node.
It will start searching from root node S, and traverse A, then B, then D and E, after
traversing E, it will backtrack the tree as E has no other successor and still the goal
node is not found. After backtracking it will traverse node C and then G, and here it
will terminate as it found the goal node.
Completeness: DFS search algorithm is complete within finite state space as it will
expand every node within a limited search tree.
Time Complexity: Time complexity of DFS will be equivalent to the node traversed by
the algorithm. It is given by:
T(n)= 1+ n2+ n3 +.........+ nm=O(nm)
Where, m= maximum depth of any node and this can be much larger than d
(Shallowest solution depth)
Space Complexity: DFS algorithm needs to store only a single path from the root
node, hence space complexity of DFS is equivalent to the size of the fringe set, which
is O(bm).
Optimal: DFS search algorithm is non-optimal, as it may generate a large number of
steps or high cost to reach to the goal node.

Uniform-cost Search Algorithm:


Uniform-cost search is a searching algorithm used for traversing a weighted
tree or graph. This algorithm comes into play when a different cost is available for
each edge. The primary goal of the uniform-cost search is to find a path to the goal
node which has the lowest cumulative cost. Uniform-cost search expands nodes
according to their path costs form the root node. It can be used to solve any
graph/tree where the optimal cost is in demand. A uniform-cost search algorithm is
implemented by the priority queue. It gives maximum priority to the lowest
cumulative cost. Uniform cost search is equivalent to BFS algorithm if the path cost of
all edges is the same.
Advantages:

● Uniform cost search is optimal because at every state the path with the least
cost is chosen.
Disadvantages:

● It does not care about the number of steps involve in searching and only
concerned about path cost. Due to which this algorithm may be stuck in an
infinite loop.
Example:

Completeness:
Uniform-cost search is complete, such as if there is a solution, UCS will find it.
Time Complexity:
Let C* is Cost of the optimal solution, and ε is each step to get closer to the goal
node. Then the number of steps is = C*/ε+1. Here we have taken +1, as we start from
state 0 and end to C*/ε.
Hence, the worst-case time complexity of Uniform-cost search isO(b1 + [C*/ε])/.
Space Complexity:
The same logic is for space complexity so, the worst-case space complexity of
Uniform-cost search is O(b1 + [C*/ε]).
Optimal:
Uniform-cost search is always optimal as it only selects a path with the lowest path
cost.

Informed Search Algorithms:


The informed search algorithm is more useful for large search spaces. Informed
search algorithms use the idea of heuristic, so it is also called Heuristic search.
Heuristics function: Heuristic is a function which is used in Informed Search, and it
finds the most promising path. It takes the current state of the agent as its input and
produces the estimation of how close the agent is from the goal. The heuristic
method, however, might not always give the best solution, but it is guaranteed to find
a good solution in reasonable time. Heuristic function estimates how close a state is
to the goal. It is represented by h(n), and it calculates the cost of an optimal path
between the pair of states. The value of the heuristic function is always positive.
Admissibility of the heuristic function is given as:
1. h(n) <= h*(n)

Here h(n) is heuristic cost, and h*(n) is the estimated cost. Hence heuristic cost should
be less than or equal to the estimated cost.

Best-first Search Algorithm (Greedy Search):


Greedy best-first search algorithm always selects the path which appears best
at that moment. It is the combination of depth-first search and breadth-first search
algorithms. It uses the heuristic function and search. Best-first search allows us to
take the advantages of both algorithms. With the help of best-first search, at each
step, we can choose the most promising node. In the best first search algorithm, we
expand the node which is closest to the goal node and the closest cost is estimated by
heuristic function, i.e.
f(n)= g(n).
Were, h(n)= estimated cost from node n to the goal.
The greedy best first algorithm is implemented by the priority queue.

Best first search algorithm:

● Step 1: Place the starting node into the OPEN list.


● Step 2: If the OPEN list is empty, Stop and return failure.
● Step 3: Remove the node n, from the OPEN list which has the lowest value of
h(n), and places it in the CLOSED list.
● Step 4: Expand the node n, and generate the successors of node n.
● Step 5: Check each successor of node n, and find whether any node is a goal
node or not. If any successor node is goal node, then return success and
terminate the search, else proceed to Step 6.
● Step 6: For each successor node, algorithm checks for evaluation function f(n),
and then checks if the node has been in either OPEN or CLOSED list. If the node
has not been in both lists, then add it to the OPEN list.
● Step 7: Return to Step 2.

Advantages:

● Best first search can switch between BFS and DFS by gaining the advantages of
both the algorithms.
● This algorithm is more efficient than BFS and DFS algorithms.

Disadvantages:

● It can behave as an unguided depth-first search in the worst case scenario.


● It can get stuck in a loop as DFS.
● This algorithm is not optimal.
Example:
Consider the below search problem, and we will traverse it using greedy best-
first search. At each iteration, each node is expanded using the evaluation function
f(n)=h(n), which is given in the below table.

In this search example, we are using two lists which are OPEN and CLOSED
Lists. Following are the iterations for traversing the above example.
Expand the nodes of S and put in the CLOSED list
Initialization: Open [A, B], Closed [S]
Iteration 1: Open [A], Closed [S, B]
Iteration 2: Open [E, F, A], Closed [S, B]
: Open [E, A], Closed [S, B, F]
Iteration 3: Open [I, G, E, A], Closed [S, B, F]
: Open [I, E, A], Closed [S, B, F, G]
Hence the final solution path will be: S----> B----->F----> G
Time Complexity: The worst case time complexity of Greedy best first search is O(b m).
Space Complexity: The worst case space complexity of Greedy best first search is
O(bm). Where, m is the maximum depth of the search space.
Complete: Greedy best-first search is also incomplete, even if the given state space is
finite.
Optimal: Greedy best first search algorithm is not optimal.

A* Search Algorithm:
A* search is the most commonly known form of best-first search. It uses
heuristic function h(n), and cost to reach the node n from the start state g(n). It has
combined features of UCS and greedy best-first search, by which it solves the problem
efficiently. A* search algorithm finds the shortest path through the search space using
the heuristic function. This search algorithm expands less search trees and provides
optimal results faster. A* algorithm is similar to UCS except that it uses g(n)+h(n)
instead of g(n).
In A* search algorithm, we use search heuristic as well as the cost to reach the node.
Hence we can combine both costs as follows, and this sum is called as a fitness
number.
At each point in the search space, only those nodes are expanded which have the
lowest value of f(n), and the algorithm terminates when the goal node is found.

Algorithm of A* search:
Step1: Place the starting node in the OPEN list.
Step 2: Check if the OPEN list is empty or not, if the list is empty then return failure
and stops.
Step 3: Select the node from the OPEN list which has the smallest value of evaluation
function (g+h), if node n is goal node then return success and stop, otherwise
Step 4: Expand node n and generate all of its successors, and put n into the closed list.
For each successor n', check whether n' is already in the OPEN or CLOSED list, if not
then compute the evaluation function for n' and place it into the Open list.
Step 5: Else if node n' is already in OPEN and CLOSED, then it should be attached to
the back pointer which reflects the lowest g(n') value.
Step 6: Return to Step 2.

Advantages:

● A* search algorithm is the best algorithm than other search algorithms.


● A* Search algorithm is optimal and complete.
● This algorithm can solve very complex problems.

Disadvantages:

● It does not always produce the shortest path as it is mostly based on heuristics
and approximation.
● A* Search algorithm has some complexity issues.
● The main drawback of A* is memory requirement as it keeps all generated
nodes in the memory, so it is not practical for various large-scale problems.
Example:
In this example, we will traverse the given graph using the A* algorithm. The
heuristic value of all states is given in the below table so we will calculate the f(n) of
each state using the formula f(n)= g(n) + h(n), where g(n) is the cost to reach any node
from start state.
Here we will use OPEN and CLOSED list.

Solution:
Initialization: {(S, 5)}
Iteration1: {(S--> A, 4), (S-->G, 10)}
Iteration2: {(S--> A-->C, 4), (S--> A-->B, 7), (S-->G, 10)}
Iteration3: {(S--> A-->C--->G, 6), (S--> A-->C--->D, 11), (S--> A-->B, 7), (S-->G, 10)}
Iteration 4 will give the final result, as S--->A--->C--->G it provides the optimal path
with cost 6.
Points to remember:

● A* algorithm returns the path which occurred first, and it does not search for all
remaining paths.
● The efficiency of A* algorithm depends on the quality of heuristic.
● A* algorithm expands all nodes which satisfy the condition f(n)
Complete: A* algorithm is complete as long as:

● Branching factor is finite.


● Cost at every action is fixed.
Optimal: A* search algorithm is optimal if it follows below two conditions:

● Admissible: the first condition requires for optimality is that h(n) should be an
admissible heuristic for A* tree search. An admissible heuristic is optimistic in
nature.
● Consistency: Second required condition is consistency for only A* graph-search.
If the heuristic function is admissible, then A* tree search will always find the least
cost path.
Time Complexity: The time complexity of A* search algorithm depends on heuristic
function, and the number of nodes expanded is exponential to the depth of solution
d. So the time complexity is O(b^d), where b is the branching factor.
Space Complexity: The space complexity of A* search algorithm is O(b^d)

Constraint satisfaction problems:


Consider a Sudoku game with some numbers filed initially in some squares. You are
expected to fill the empty squares with numbers ranging from 1 to 9 in such a way
that no row, column or a block has a number repeating itself. This is a very basic
constraint satisfaction problem. You are supposed to solve a problem keeping in mind
some constraints. The remaining squares that are to be filled are known as variables,
and the range of numbers (1-9) that can fill them is known as a domain. Variables take
on values from the domain. The conditions governing how a variable will choose its
domain are known as constraints.

Definition:

A constraint satisfaction problem (CSP) is a problem that requires its solution within
some limitations or conditions also known as constraints. It consists of the following:

● A finite set of variables which stores the solution (V = {V1, V2, V3,....., Vn})
● A set of discrete values known as domain from which the solution is picked (D =
{D1, D2, D3,.....,Dn})
● A finite set of constraints (C = {C1, C2, C3,......, Cn})
Please note, that the elements in the domain can be both continuous and discrete but
in AI, we generally only deal with discrete values.

Also, note that all these sets should be finite except for the domain set. Each variable
in the variable set can have different domains. For example, consider the Sudoku
problem again. Suppose that a row, column and block already have 3, 5 and 7 filled in.
Then the domain for all the variables in that row, column and block will be {1, 2, 4, 6,
8, 9}.

Popular Problems with CSP


The following problems are some of the popular problems that can be solved using
CSP:

1. CryptArithmetic (Coding alphabets to numbers.)


2. n-Queen (In an n-queen problem, n queens should be placed in an nXn
matrix such that no queen shares the same row, column or diagonal.)
3. Map Coloring (coloring different regions of map, ensuring no adjacent
regions have the same color)
4. Crossword (everyday puzzles appearing in newspapers)
5. Sudoku (a number grid)
6. Latin Square Problem

A problem to be converted to CSP requires the following steps:

Step 1: Create a variable set.


Step 2: Create a domain set.
Step 3: Create a constraint set with variables and domains (if possible) after
considering the constraints.
Step 4: Find an optimal solution.

Knowledge representation & reasoning:


Humans are best at understanding, reasoning, and interpreting knowledge. Humans
know things, which is knowledge and as per their knowledge they perform various
actions in the real world. But how machines do all these things comes under
knowledge representation and reasoning.
What is Knowledge? Knowledge, nol′ij, n. assured belief; that which is known;
information; …

In order to solve the complex problems encountered in AI, one generally needs a
large amount of knowledge, and suitable mechanisms for representing and
manipulating all that knowledge. Knowledge can take many forms.

Some simple examples are:

John has an umbrella

It is raining

An umbrella stops you getting wet when it’s raining


An umbrella will only stop you getting wet if it is used properly

Umbrellas are not so useful when it is very windy

So, how should an AI agent store and manipulate knowledge like this?
What is Knowledge Representation?
Knowledge Representation in AI describes the representation of knowledge.
Basically, it is a study of how the beliefs, intentions, and judgments of an intelligent
agent can be expressed suitably for automated reasoning. One of the primary
purposes of Knowledge Representation includes modelling intelligent behaviour for
an agent.

Knowledge Representation and Reasoning (KR, KRR) represents information


from the real world for a computer to understand and then utilize this knowledge to
solve complex real-life problems like communicating with human beings in natural
language. Knowledge representation in AI is not just about storing data in a database,
it allows a machine to learn from that knowledge and behave intelligently like a
human being.

The different kinds of knowledge that need to be represented in AI include:

● Object: All the facts about objects in our world domain. E.g., Guitars contain
strings, trumpets are brass instruments.
● Events: Events are the actions which occur in our world.
● Performance: It describes behaviour which involves knowledge about how to
do things.
● Meta-knowledge: It is knowledge about what we know.
● Facts: Facts are the truths about the real world and what we represent.
● Knowledge-Base: The central component of the knowledge-based agents is the
knowledge base. It is represented as KB. The Knowledgebase is a group of the
Sentences (Here, sentences are used as a technical term and not identical with
the English language).
Different Types of Knowledge
There are 5 types of Knowledge such as:
● Declarative Knowledge – It includes concepts, facts, and objects and expressed
in a declarative sentence.
● Structural Knowledge – It is a basic problem-solving knowledge that describes
the relationship between concepts and objects.
● Procedural Knowledge – This is responsible for knowing how to do something
and includes rules, strategies, procedures, etc.
● Meta Knowledge – Meta Knowledge defines knowledge about other types of
Knowledge.
● Heuristic Knowledge – This represents some expert knowledge in the field or
subject.

Cycle of Knowledge Representation in AI


Artificial Intelligent Systems usually consist of various components to display their
intelligent behavior. Some of these components include:

● Perception
● Learning
● Knowledge Representation & Reasoning
● Planning
● Execution

Here is an example to show the different components of the system and how it
works:

Example
The above diagram shows the interaction of an AI system with the real world and the
components involved in showing intelligence.

● The Perception component retrieves data or information from the


environment. with the help of this component, you can retrieve data from the
environment, find out the source of noises and check if the AI was damaged by
anything. Also, it defines how to respond when any sense has been detected.
● Then, there is the Learning Component that learns from the captured data by
the perception component. The goal is to build computers that can be taught
instead of programming them. Learning focuses on the process of self-
improvement. In order to learn new things, the system requires knowledge
acquisition, inference, acquisition of heuristics, faster searches, etc.
● The main component in the cycle is Knowledge Representation and Reasoning
which shows the human-like intelligence in the machines. Knowledge
representation is all about understanding intelligence. Instead of trying to
understand or build brains from the bottom up, its goal is to understand and
build intelligent behavior from the top-down and focus on what an agent needs
to know in order to behave intelligently. Also, it defines how automated
reasoning procedures can make this knowledge available as needed.
● The Planning and Execution components depend on the analysis of knowledge
representation and reasoning. Here, planning includes giving an initial state,
finding their preconditions and effects, and a sequence of actions to achieve a
state in which a particular goal holds. Now once the planning is completed, the
final stage is the execution of the entire process.

What is the Relation between Knowledge & Intelligence?


In the real world, knowledge plays a vital role in intelligence as well as creating
artificial intelligence. It demonstrates the intelligent behavior in AI agents or systems.
It is possible for an agent or system to act accurately on some input only when it has
the knowledge or experience about the input.

Let’s take an example to understand the relationship:

In this example, there is one decision-maker whose actions are justified by sensing the
environment and using knowledge. But, if we remove the knowledge part here, it will
not be able to display any intelligent behavior.
Techniques of Knowledge Representation in AI
There are four techniques of representing knowledge such as:
Logical Representation
Logical representation is a language with some definite rules which deal with
propositions and has no ambiguity in representation. It represents a conclusion based
on various conditions and lays down some important communication rules. Also, it
consists of precisely defined syntax and semantics which supports the sound
inference. Each sentence can be translated into logics using syntax and semantics.

Semantics
Syntax

● It decides how we can construct legal ● Semantics are the rules by which we
sentences in logic. can interpret the sentence in the
● It determines which symbol we can logic.
use in knowledge representation. ● It assigns a meaning to each
● Also, how to write those symbols. sentence.

Logic can be further divided as:

Propositional Logic: This technique is also known as propositional calculus,


statement logic, or sentential logic. It is used for representing the knowledge
about what is true and what is false.
First-order Logic: It is also known as Predicate logic or First-order predicate
calculus (FOPL). This technique is used to represent the objects in the form of
predicates or quantifiers. It is different from Propositional logic as it removes
the complexity of the sentence represented by it. In short, FOPL is an advanced
version of propositional logic.

Advantages:
● Logical representation helps to perform logical reasoning.
● This representation is the basis for the programming languages.
Disadvantages:
● Logical representations have some restrictions and are challenging to work
with.
● This technique may not be very natural, and inference may not be very
efficient.

Semantic Network Representation


Semantic networks work as an alternative of predicate logic for knowledge
representation. In Semantic networks, you can represent your knowledge in the form
of graphical networks. This network consists of nodes representing objects and arcs
which describe the relationship between those objects. Also, it categorizes the object
in different forms and links those objects.
This representation consists of two types of relations:
● IS-A relation (Inheritance)
● Kind-of-relation

Advantages:

● Semantic networks are a natural representation of knowledge.


● Also, it conveys meaning in a transparent manner.
● These networks are simple and easy to understand.
Disadvantages:

● Semantic networks take more computational time at runtime.


● Also, these are inadequate as they do not have any equivalent quantifiers.
● These networks are not intelligent and depend on the creator of the system.

Frame Representation
A frame is a record like structure that consists of a collection of attributes and values
to describe an entity in the world. These are the AI data structures that divide
knowledge into substructures by representing stereotypes situations. Basically, it
consists of a collection of slots and slot values of any type and size. Slots have names
and values which are called facets.

Advantages:

● It makes the programming easier by grouping the related data.


● Frame representation is easy to understand and visualize.
● It is very easy to add slots for new attributes and relations.
● Also, it is easy to include default data and search for missing values.

Disadvantages:

● In frame system inference, the mechanism cannot be easily processed.


● The inference mechanism cannot be smoothly proceeded by frame
representation.
● It has a very generalized approach.

Production Rules
In production rules, the agent checks for the condition and if the condition exists then
production rule fires and corresponding action is carried out. The condition part of the
rule determines which rule may be applied to a problem. Whereas, the action part
carries out the associated problem-solving steps. This complete process is called a
recognize-act cycle.
The production rules system consists of three main parts:
● The set of production rules
● Working Memory
● The recognize-act-cycle
Advantages:

● The production rules are expressed in natural language.


● The production rules are highly modular and can be easily removed or
modified.

Disadvantages:

● It does not exhibit any learning capabilities and does not store the result of the
problem for future uses.
● During the execution of the program, many rules may be active. Thus, rule-
based production systems are inefficient.

Representation Requirements
A good knowledge representation system must have properties such as:

● Representational Accuracy: It should represent all kinds of required knowledge.


● Inferential Adequacy: It should be able to manipulate the representational
structures to produce new knowledge corresponding to the existing structure.
● Inferential Efficiency: The ability to direct the inferential knowledge mechanism
into the most productive directions by storing appropriate guides.
● Acquisitional efficiency: The ability to acquire new knowledge easily using
automatic methods.

Approaches to Knowledge Representation in AI


There are different approaches to knowledge representation such as:

1. Simple Relational Knowledge


It is the simplest way of storing facts which uses the relational method. Here, all the
facts about a set of the object are set out systematically in columns. Also, this
approach of knowledge representation is famous in database systems where the
relationship between different entities is represented. Thus, there is little opportunity
for inference.

Example:

Age Emp ID
Name
John 25 100071

Amanda 23 100056

Sam 27 100042

This is an example of representing simple relational knowledge.

2. Inheritable Knowledge
In the inheritable knowledge approach, all data must be stored into a hierarchy of
classes and should be arranged in a generalized form or a hierarchical manner. Also,
this approach contains inheritable knowledge which shows a relation between
instance and class, and it is called instance relation. In this approach, objects and
values are represented in Boxed nodes.

Example:

3. Inferential Knowledge

The inferential knowledge approach represents knowledge in the form of formal logic.
Thus, it can be used to derive more facts. Also, it guarantees correctness.
Example:

Statement 1: John is a cricketer.

Statement 2: All cricketers are athletes.

Then it can be represented as;

Cricketer(John)

∀x = Cricketer (x) ———-> Athlete (x)s

Procedural knowledge:

● Procedural knowledge approach uses small programs and codes which


describes how to do specific things, and how to proceed.
● In this approach, one important rule is used which is If-Then rule.
● In this knowledge, we can use various coding languages such as LISP language
and Prolog language.
● We can easily represent heuristic or domain-specific knowledge using this
approach.
● But it is not necessary that we can represent all cases in this approach.
Requirements for knowledge Representation system:
A good knowledge representation system must possess the following properties.

1. Representational

Accuracy:
Knowledge Representation system should have the ability to represent
all kinds of required knowledge.
2. Inferential
Adequacy:
Knowledge Representation system should have the ability to manipulate
the representational structures to produce new knowledge
corresponding to existing structures.
3. Inferential
Efficiency:
The ability to direct the inferential knowledge mechanism into the most
productive directions by storing appropriate guides.
4. Acquisitional
efficiency: The ability to acquire new knowledge easily using automatic
methods.
Challenges/Issues in Knowledge Representation

● Important Attributes: Some basic attributes were occurring in almost


every problem domain.
● Relationship among attributes: Any important relationship which exists
among object attributes.
● Choosing Granularity: How much detailed knowledge is needed to be
represented?
● Set of Objects: How to represent the set of objects?
● Finding the right structure: The information is stored in a large amount.
The question is how to access the relevant information out of the whole?

Reasoning:
The reasoning is the mental process of deriving logical conclusions and making
predictions from available knowledge, facts, and beliefs. Or we can say, "Reasoning is
a way to infer facts from existing data." It is a general process of thinking rationally,
to find valid conclusions.
In artificial intelligence, the reasoning is essential so that the machine can also think
rationally as a human brain, and can perform like a human.
Types of Reasoning
In artificial intelligence, reasoning can be divided into the following categories:

● Deductive reasoning
● Inductive reasoning
● Abductive reasoning
● Common Sense Reasoning
● Monotonic Reasoning
● Non-monotonic Reasoning
Note: Inductive and deductive reasoning are the forms of propositional logic.
1. Deductive reasoning:
Deductive reasoning is deducing new information from logically related known
information. It is the form of valid reasoning, which means the argument's conclusion
must be true when the premises are true.
Deductive reasoning is a type of propositional logic in AI, and it requires various rules
and facts. It is sometimes referred to as top-down reasoning, and contradictory to
inductive reasoning.
In deductive reasoning, the truth of the premises guarantees the truth of the
conclusion.
Deductive reasoning mostly starts from the general premises to the specific
conclusion, which can be explained as below example.

Example:

Premise-1: All the human eats veggies


Premise-2: Suresh is human.
Conclusion: Suresh eats veggies.
The general process of deductive reasoning is given below:

2. Inductive Reasoning:
Inductive reasoning is a form of reasoning to arrive at a conclusion using limited sets
of facts by the process of generalization. It starts with the series of specific facts or
data and reaches to a general statement or conclusion.
Inductive reasoning is a type of propositional logic, which is also known as cause-
effect reasoning or bottom-up reasoning.
In inductive reasoning, we use historical data or various premises to generate a
generic rule, for which premises support the conclusion.
In inductive reasoning, premises provide probable supports to the conclusion, so the
truth of premises does not guarantee the truth of the conclusion.

Example:

Premise: All of the pigeons we have seen in the zoo are white.
Conclusion: Therefore, we can expect all the pigeons to be white.
3. Abductive reasoning:
Abductive reasoning is a form of logical reasoning which starts with single or multiple
observations then seeks to find the most likely explanation or conclusion for the
observation.
Abductive reasoning is an extension of deductive reasoning, but in abductive
reasoning, the premises do not guarantee the conclusion.

Example:

Implication: Cricket ground is wet if it is raining


Axiom: Cricket ground is wet.
Conclusion It is raining.
4. Common Sense Reasoning
Common sense reasoning is an informal form of reasoning, which can be gained
through experiences.
Common Sense reasoning simulates the human ability to make presumptions about
events which occurs on every day.
It relies on good judgment rather than exact logic and operates on heuristic
knowledge and heuristic rules.

Example:

1. One person can be at one place at a time.


2. If I put my hand in a fire, then it will burn.
The above two statements are the examples of common sense reasoning which a
human mind can easily understand and assume.
5. Monotonic Reasoning:
In monotonic reasoning, once the conclusion is taken, then it will remain the same
even if we add some other information to existing information in our knowledge base.
In monotonic reasoning, adding knowledge does not decrease the set of prepositions
that can be derived.
To solve monotonic problems, we can derive the valid conclusion from the available
facts only, and it will not be affected by new facts.
Monotonic reasoning is not useful for the real-time systems, as in real time, facts get
changed, so we cannot use monotonic reasoning.
Monotonic reasoning is used in conventional reasoning systems, and a logic-based
system is monotonic.
Any theorem proving is an example of monotonic reasoning.

Example:

● Earth revolves around the Sun.


It is a true fact, and it cannot be changed even if we add another sentence in the
knowledge base like, "The moon revolves around the earth" Or "Earth is not round,"
etc.
Advantages of Monotonic Reasoning:

● In monotonic reasoning, each old proof will always remain valid.


● If we deduce some facts from available facts, then it will remain valid for
always.
Disadvantages of Monotonic Reasoning:

● We cannot represent the real world scenarios using Monotonic reasoning.


● Hypothesis knowledge cannot be expressed with monotonic reasoning, which
means facts should be true.
● Since we can only derive conclusions from the old proofs, so new knowledge
from the real world cannot be added.
6. Non-monotonic Reasoning
In Non-monotonic reasoning, some conclusions may be invalidated if we add some
more information to our knowledge base.
Logic will be said as non-monotonic if some conclusions can be invalidated by adding
more knowledge into our knowledge base.
Non-monotonic reasoning deals with incomplete and uncertain models.
"Human perceptions for various things in daily life, "is a general example of non-
monotonic reasoning.
Example: Let suppose the knowledge base contains the following knowledge:

● Birds can fly


● Penguins cannot fly
● Pitty is a bird
So from the above sentences, we can conclude that Pitty can fly.
However, if we add one more sentence into the knowledge base "Pitty is a penguin",
which concludes "Pitty cannot fly", so it invalidates the above conclusion.
Advantages of Non-monotonic reasoning:

● For real-world systems such as Robot navigation, we can use non-monotonic


reasoning.
● In Non-monotonic reasoning, we can choose probabilistic facts or can make
assumptions.
Disadvantages of Non-monotonic Reasoning:

● In non-monotonic reasoning, the old facts may be invalidated by adding new


sentences.
● It cannot be used for theorem proving.

Difference between Inductive and Deductive reasoning

Basis for comparison Deductive Reasoning Inductive Reasoning

Definition Deductive reasoning is the Inductive reasoning arrives at a


form of valid reasoning, to conclusion by the process of
deduce new information generalization using specific facts
or conclusion from known or data.
related facts and
information.

Approach Deductive reasoning Inductive reasoning follows a


follows a top-down bottom-up approach.
approach.

Starts from Deductive reasoning starts Inductive reasoning starts from


from Premises. the Conclusion.

Validity In deductive reasoning, In inductive reasoning, the truth


the conclusion must be of premises does not guarantee
true if the premises are the truth of conclusions.
true.

Usage Use of deductive Use of inductive reasoning is fast


reasoning is difficult, as and easy, as we need evidence
we need facts which must instead of true facts. We often
be true. use it in our daily life.

Process Theory→ hypothesis→ Observations-


patterns→confirmation. →patterns→hypothesis→Theory.

Argument In deductive reasoning, In inductive reasoning, arguments


arguments may be valid or may be weak or strong.
invalid.

Structure Deductive reasoning Inductive reasoning reaches from


reaches from general facts specific facts to general facts.
to specific facts.
Non-standard logics:

Uncertain and probabilistic reasoning:


Uncertainty:
Till now, we have learned knowledge representation using first-order logic and
propositional logic with certainty, which means we were sure about the predicates.
With this knowledge representation, we might write A→B, which means if A is true
then B is true, but consider a situation where we are not sure about whether A is true
or not then we cannot express this statement, this situation is called uncertainty.
So to represent uncertain knowledge, where we are not sure about the predicates,
we need uncertain reasoning or probabilistic reasoning.
Causes of uncertainty:
Following are some leading causes of uncertainty to occur in the real world.

1. Information occurred from unreliable sources.


2. Experimental Errors
3. Equipment fault
4. Temperature variation
5. Climate change.
Probabilistic reasoning:
Probabilistic reasoning is a way of knowledge representation where we apply the
concept of probability to indicate the uncertainty in knowledge. In probabilistic
reasoning, we combine probability theory with logic to handle the uncertainty.
We use probability in probabilistic reasoning because it provides a way to handle the
uncertainty that is the result of someone's laziness and ignorance.
In the real world, there are lots of scenarios, where the certainty of something is not
confirmed, such as "It will rain today," "behavior of someone for some situations," "A
match between two teams or two players." These are probable sentences for which
we can assume that it will happen but not sure about it, so here we use probabilistic
reasoning.
Need of probabilistic reasoning in AI:

● When there are unpredictable outcomes.


● When specifications or possibilities of predicates become too large to handle.
● When an unknown error occurs during an experiment.
In probabilistic reasoning, there are two ways to solve problems with uncertain
knowledge:

● Bayes' rule
● Bayesian Statistics
Note: We will learn the above two rules in later chapters.
As probabilistic reasoning uses probability and related terms, so before understanding
probabilistic reasoning, let's understand some common terms:
Probability: Probability can be defined as a chance that an uncertain event will occur.
It is the numerical measure of the likelihood that an event will occur. The value of
probability always remains between 0 and 1 that represent ideal uncertainties.
1. 0 ≤ P(A) ≤ 1, where P(A) is the probability of an event A.
1. P(A) = 0, indicates total uncertainty in an event A.
1. P(A) =1, indicates total certainty in an event A.
We can find the probability of an uncertain event by using the below formula.

● P(¬A) = probability of a not happening event.


● P(¬A) + P(A) = 1.
Event: Each possible outcome of a variable is called an event.
Sample space: The collection of all possible events is called sample space.
Random variables: Random variables are used to represent the events and objects in
the real world.
Prior probability: The prior probability of an event is probability computed before
observing new information.
Posterior Probability: The probability that is calculated after all evidence or
information has been taken into account. It is a combination of prior probability and
new information.
Conditional probability:
Conditional probability is a probability of occurring an event when another event has
already happened.
Let's suppose, we want to calculate the event A when event B has already occurred,
"the probability of A under the conditions of B", it can be written as:

Where P(A⋀B)= Joint probability of a and B


P(B)= Marginal probability of B.
If the probability of A is given and we need to find the probability of B, then it will be
given as:

It can be explained by using the below Venn diagram, where B is occurred event, so
sample space will be reduced to set B, and now we can only calculate event A when
event B is already occurred by dividing the probability of P(A⋀B) by P( B ).
Example:
In a class, there are 70% of the students who like English and 40% of the students who
like English and mathematics, and then what is the percentage of students those who
like English also like mathematics?
Solution:
Let, A is an event that a student likes Mathematics
B is an event that a student likes English.

Hence, 57% are the students who like English also like Mathematics.

Chapter 2
Introduction to Machine Learning
The term Machine Learning was first coined by Arthur Samuel in the year 1959.
Looking back, that year was probably the most significant in terms of technological
advancements.
If you browse through the net about ‘what is Machine Learning’, you’ll get at least
100 different definitions. However, the very first formal definition was given by
Tom M. Mitchell:

  “A computer program is said to learn from experience E with respect to some


class of tasks T and performance measure P if its performance at tasks in T, as
measured by P, improves with experience E.” 

In simple terms, Machine learning is a subset of Artificial Intelligence (AI) which


provides machines the ability to learn automatically & improve from experience
without being explicitly programmed to do so. In the sense, it is the practice of
getting Machines to solve problems by gaining the ability to think.

But wait, can a machine think or make decisions? Well, if you feed a machine a
good amount of data, it will learn how to interpret process and analyze this data
by using Machine Learning Algorithms, in order to solve real-world problems.

Before moving any further, let’s discuss some of the most commonly used
terminologies in Machine Learning.

Machine Learning Definitions

Algorithm: A Machine Learning algorithm is a set of rules and statistical techniques


used to learn patterns from data and draw significant information from it. It is the
logic behind a Machine Learning model. An example of a Machine Learning
algorithm is the Linear Regression algorithm.

Model: A model is the main component of Machine Learning. A model is trained


by using a Machine Learning Algorithm. An algorithm maps all the decisions that a
model is supposed to take based on the given input, in order to get the correct
output.

Predictor Variable: It is a feature(s) of the data that can be used to predict the
output.

Response Variable: It is the feature or the output variable that needs to be


predicted by using the predictor variable(s).
Training Data: The Machine Learning model is built using the training data. The
training data helps the model to identify key trends and patterns essential to
predict the output.

Testing Data: After the model is trained, it must be tested to evaluate how


accurately it can predict an outcome. This is done by the testing data set.

What is Machine Learning?

Machine learning is a subfield of artificial intelligence (AI). The goal of machine


learning generally is to understand the structure of data and fit that data into
models that can be understood and utilized by people.

According to Arthur Samuel, Machine Learning algorithms enable the computers


to learn from data, and even improve themselves, without being explicitly
programmed.

Machine learning (ML) is a category of an algorithm that allows software


applications to become more accurate in predicting outcomes without being
explicitly programmed. The basic premise of machine learning is to build
algorithms that can receive input data and use statistical analysis to predict an
output while updating outputs as new data becomes available.

Machine Learning Process

The Machine Learning process involves building a Predictive model that can be
used to find a solution for a Problem Statement. To understand the Machine
Learning process let’s assume that you have been given a problem that needs to
be solved by using Machine Learning.
The problem is to predict the occurrence of rain in your local area by using Machine
Learning.

The below steps are followed in a Machine Learning process:


Step 1: Define the objective of the Problem Statement
At this step, we must understand what exactly needs to be predicted. In our case,
the objective is to predict the possibility of rain by studying weather conditions. At
this stage, it is also essential to take mental notes on what kind of data can be
used to solve this problem or the type of approach you must follow to get to the
solution.
Step 2: Data Gathering
At this stage, you must be asking questions such as,
 What kind of data is needed to solve this problem?
 Is the data available?
 How can I get the data?
Once you know the types of data that is required, you must understand how you
can derive this data. Data collection can be done manually or by web scraping.
However, if you’re a beginner and you’re just looking to learn Machine Learning
you don’t have to worry about getting the data. There are 1000s of data resources
on the web, you can just download the data set and get going.
Coming back to the problem at hand, the data needed for weather forecasting
includes measures such as humidity level, temperature, pressure, locality, whether
or not you live in a hill station, etc. Such data must be collected and stored for
analysis.

Step 3: Data Preparation

The data you collected is almost never in the right format. You will encounter a lot
of inconsistencies in the data set such as missing values, redundant variables,
duplicate values, etc. Removing such inconsistencies is very essential because they
might lead to wrongful computations and predictions. Therefore, at this stage, you
scan the data set for any inconsistencies and you fix them then and there.

Step 4: Exploratory Data Analysis

Grab your detective glasses because this stage is all about diving deep into data
and finding all the hidden data mysteries. EDA or Exploratory Data Analysis is the
brainstorming stage of Machine Learning. Data Exploration involves understanding
the patterns and trends in the data. At this stage, all the useful insights are drawn
and correlations between the variables are understood.

For example, in the case of predicting rainfall, we know that there is a strong
possibility of rain if the temperature has fallen low. Such correlations must be
understood and mapped at this stage.

Step 5: Building a Machine Learning Model

All the insights and patterns derived during Data Exploration are used to build the
Machine Learning Model. This stage always begins by splitting the data set into
two parts, training data, and testing data. The training data will be used to build
and analyze the model. The logic of the model is based on the Machine Learning
Algorithm that is being implemented.

In the case of predicting rainfall, since the output will be in the form of True (if it
will rain tomorrow) or False (no rain tomorrow), we can use a Classification
Algorithm such as Logistic Regression.

Choosing the right algorithm depends on the type of problem you’re trying to
solve, the data set and the level of complexity of the problem. In the upcoming
sections, we will discuss the different types of problems that can be solved by
using Machine Learning.

Step 6: Model Evaluation & Optimization

After building a model by using the training data set, it is finally time to put the
model to a test. The testing data set is used to check the efficiency of the model
and how accurately it can predict the outcome. Once the accuracy is calculated,
any further improvements in the model can be implemented at this stage.
Methods like parameter tuning and cross-validation can be used to improve the
performance of the model.

Step 7: Predictions

Once the model is evaluated and improved, it is finally used to make predictions.
The final output can be a Categorical variable (eg. True or False) or it can be a
Continuous Quantity (eg. the predicted value of a stock).

In our case, for predicting the occurrence of rainfall, the output will be a
categorical variable.

So that was the entire Machine Learning process. Now it’s time to learn about the
different ways in which Machines can learn.

Machine Learning Types

A machine can learn to solve a problem by following any one of the following three
approaches. These are the ways in which a machine can learn:

1. Supervised Learning
2. Unsupervised Learning
3. Reinforcement Learning
Supervised:
“Supervised learning is a technique in which we teach or train the machine using
data which is well labeled.”

To understand Supervised Learning let’s consider an analogy. As kids we all


needed guidance to solve math problems. Our teachers helped us understand
what addition is and how it is done. Similarly, you can think of supervised learning
as a type of Machine Learning that involves a guide. The labeled data set is the
teacher that will train you to understand patterns in the data. The labeled data set
is nothing but the training data set.

Supervised Learning – Introduction To Machine Learning


Consider the above figure. Here we’re feeding the machine images of Tom and
Jerry and the goal is for the machine to identify and classify the images into two
groups (Tom images and Jerry images). The training data set that is fed to the
model is labeled, as in, we’re telling the machine, ‘this is how Tom looks and this is
Jerry’. By doing so you’re training the machine by using labeled data. In Supervised
Learning, there is a well-defined training phase done with the help of labeled data.

Types of Supervised learning

 Classification: A classification problem is when the output variable is a


category, such as “red” or “blue” or “disease” and “no disease”.

 Regression: A regression problem is when the output variable is a real value,


such as “dollars” or “weight”.

Unsupervised:
“Unsupervised learning involves training by using unlabeled data and allowing
the model to act on that information without guidance. “

Think of unsupervised learning as a smart kid that learns without any guidance.
In this type of Machine Learning, the model is not fed with labeled data, as in the
model has no clue that ‘this image is Tom and this is Jerry’, it figures out patterns
and the differences between Tom and Jerry on its own by taking in tons of data.
Unsupervised Learning – Introduction To Machine Learning

For example, it identifies prominent features of Tom such as pointy ears, bigger size,
etc, to understand that this image is of type 1. Similarly, it finds such features in Jerry
and knows that this image is of type 2. Therefore, it classifies the images into two
different classes without knowing who Tom is or Jerry is.

Types of Unsupervised learning

Clustering: A clustering problem is where you want to discover the inherent groupings
in the data, such as grouping customers by purchasing behavior.

Association: An association rule learning problem is where you want to discover rules
that describe large portions of your data, such as people that buy X also tend to buy Y.

Semi-Supervised Learning:
the algorithm is trained upon a combination of labeled and unlabeled data. Typically,
this combination will contain a very small amount of labeled data and a very large
amount of unlabeled data. The basic procedure involved is that first, the programmer
will cluster similar data using an unsupervised learning algorithm and then use the
existing labeled data to label the rest of the unlabeled data. The typical use cases of
such type of algorithm have a common property among them – The acquisition of
unlabeled data is relatively cheap while labeling the said data is very expensive.

Intuitively, one may imagine the three types of learning algorithms as Supervised
learning where a student is under the supervision of a teacher at both home and
school, Unsupervised learning where a student has to figure out a concept himself
and Semi-Supervised learning where a teacher teaches a few concepts in class and
gives questions as homework which are based on similar concepts.

A Semi-Supervised algorithm assumes the following about the data –

1. Continuity Assumption: The algorithm assumes that the points which are closer
to each other are more likely to have the same output label.

2. Cluster Assumption: The data can be divided into discrete clusters and points in
the same cluster are more likely to share an output label.

3. Manifold Assumption: The data lie approximately on a manifold of much lower


dimension than the input space. This assumption allows the use of distances
and densities which are defined on a manifold.

Practical applications of Semi-Supervised Learning –

1. Speech Analysis: Since labeling of audio files is a very intensive task, Semi-


Supervised learning is a very natural approach to solve this problem.

2. Internet Content Classification: Labeling each webpage is an impractical and


unfeasible process and thus uses Semi-Supervised learning algorithms. Even the
Google search algorithm uses a variant of Semi-Supervised learning to rank the
relevance of a webpage for a given query.

3. Protein Sequence Classification: Since DNA strands are typically very large in


size, the rise of Semi-Supervised learning has been imminent in this field.

Deep Learning:
What is Deep Learning?

Deep learning is a branch of machine learning which is completely based on artificial


neural networks, as neural network is going to mimic the human brain so deep
learning is also a kind of mimic of human brain. In deep learning, we don’t need to
explicitly program everything. The concept of deep learning is not new. It has been
around for a couple of years now. It’s on hype nowadays because earlier we did not
have that much processing power and a lot of data. As in the last 20 years, the
processing power increases exponentially, deep learning and machine learning came
in the picture.

A formal definition of deep learning is- neurons

“Deep learning is a particular kind of machine learning that achieves great


power and flexibility by learning to represent the world as a nested hierarchy of
concepts, with each concept defined in relation to simpler concepts, and more
abstract representations computed in terms of less abstract ones”.

In human brain approximately 100 billion neurons all together this is a picture
of an individual neuron and each neuron is connected through thousand of their
neighbours.
The question here is how do we recreate these neurons in a computer. So, we create
an artificial structure called an artificial neural net where we have nodes or neurons.
We have some neurons for input value and some for output value and in between,
there may be lots of neurons interconnected in the hidden layer.

Architectures :

1. Deep Neural Network – It is a neural network with a certain level of complexity


(having multiple hidden layers in between input and output layers). They are
capable of modeling and processing non-linear relationships.

2. Deep Belief Network(DBN) – It is a class of Deep Neural Network. It is multi-


layer belief networks.

Steps for performing DBN :

a. Learn a layer of features from visible units using Contrastive Divergence


algorithm.
b. Treat activations of previously trained features as visible units and then
learn features of features.
c. Finally, the whole DBN is trained when the learning for the final hidden
layer is achieved.

Advantages:
1. Best in-class performance on problems.
2. Reduces need for feature engineering.
3. Eliminates unnecessary costs.
4. Identifies defects easily that are difficult to detect.
Disadvantages:
1. Large amount of data required.
2. Computationally expensive to train.
3. No strong theoretical foundation.
Applications:
1. Automatic Text Generation – Corpus of text is learned and from this model
new text is generated, word-by-word or character-by-character.
Then this model is capable of learning how to spell, punctuate, form sentences,
or it may even capture the style.
2. Healthcare – Helps in diagnosing various diseases and treating it.
3. Automatic Machine Translation – Certain words, sentences or phrases in one
language is transformed into another language (Deep Learning is achieving top
results in the areas of text, images).
4. Image Recognition – Recognizes and identifies peoples and objects in images as
well as to understand content and context. This area is already being used in
Gaming, Retail, Tourism, etc.
5. Predicting Earthquakes – Teaches a computer to perform viscoelastic
computations which are used in predicting earthquakes.

Reinforcement Learning:
A reinforcement learning algorithm, or agent, learns by interacting with its
environment. The agent receives rewards by performing correctly and penalties for
performing incorrectly. The agent learns without intervention from a human by
maximizing its reward and minimizing its penalty. It is a type of dynamic programming
that trains algorithms using a system of reward and punishment.

In the above example, we can see that the agent is given 2 options i.e. a path
with water or a path with fire. A reinforcement algorithm works on reward a system
i.e. if the agent uses the fire path then the rewards are subtracted and agent tries to
learn that it should avoid the fire path. If it had chosen the water path or the safe
path then some points would have been added to the reward points, the agent then
would try to learn what path is safe and what path isn’t.

Here’s a table that sums up the difference between Regression, Classification, and
Clustering.

Regression vs Classification vs Clustering - Introduction To Machine Learning


Linear Regression:
Linear regression is one of the easiest and most popular Machine Learning
algorithms. It is a statistical method that is used for predictive analysis. Linear
regression makes predictions for continuous/real or numeric variables such as sales,
salary, age, product price, etc.
Linear regression algorithm shows a linear relationship between a dependent (y) and
one or more independent (y) variables, hence called as linear regression. Since linear
regression shows the linear relationship, which means it finds how the value of the
dependent variable is changing according to the value of the independent variable.
The linear regression model provides a sloped straight line representing the
relationship between the variables. Consider the below image:

Mathematically, we can represent a linear regression as: y= a0+a1x+ ε


Here,
Y= Dependent Variable (Target Variable)
X= Independent Variable (predictor Variable)
a0= intercept of the line (Gives an additional degree of freedom)
a1 = Linear regression coefficient (scale factor to each input value).
ε = random error
The values for x and y variables are training datasets for Linear Regression model
representation.
Types of Linear Regression
Linear regression can be further divided into two types of the algorithm:
o Simple Linear Regression:
If a single independent variable is used to predict the value of a numerical
dependent variable, then such a Linear Regression algorithm is called Simple
Linear Regression.
o Multiple Linear Regressions:
If more than one independent variable is used to predict the value of a
numerical dependent variable, then such a Linear Regression algorithm is called
Multiple Linear Regression.
Linear Regression Line
A linear line showing the relationship between the dependent and independent
variables is called a regression line. A regression line can show two types of
relationship:
o Positive Linear Relationship:
If the dependent variable increases on the Y-axis and independent variable
increases on X-axis, then such a relationship is termed as a Positive linear
relationship.

o Negative Linear Relationship:


If the dependent variable decreases on the Y-axis and independent variable
increases on the X-axis, then such a relationship is called a negative linear
relationship.
Finding the best fit line:
When working with linear regression, our main goal is to find the best fit line that
means the error between predicted values and actual values should be minimized.
The best fit line will have the least error.
The different values for weights or the coefficient of lines (a 0, a1) gives a different line
of regression, so we need to calculate the best values for a 0 and a1 to find the best fit
line, so to calculate this we use cost function.
Cost function-
o The different values for weights or coefficient of lines (a 0, a1) gives the different
line of regression, and the cost function is used to estimate the values of the
coefficient for the best fit line.
o Cost function optimizes the regression coefficients or weights. It measures how
a linear regression model is performing.
o We can use the cost function to find the accuracy of the mapping function,
which maps the input variable to the output variable. This mapping function is
also known as Hypothesis function.
For Linear Regression, we use the Mean Squared Error (MSE) cost function, which is
the average of squared error occurred between the predicted values and actual
values. It can be written as:
For the above linear equation, MSE can be calculated as:

Where,
N=Total number of observation
Yi = Actual value
(a1xi+a0)= Predicted value.
Residuals: The distance between the actual value and predicted values is called
residual. If the observed points are far from the regression line, then the residual will
be high, and so cost function will high. If the scatter points are close to the regression
line, then the residual will be small and hence the cost function.
Gradient Descent:
o Gradient descent is used to minimize the MSE by calculating the gradient of the
cost function.
o A regression model uses gradient descent to update the coefficients of the line
by reducing the cost function.
o It is done by a random selection of values of coefficient and then iteratively
update the values to reach the minimum cost function.
Model Performance:
The Goodness of fit determines how the line of regression fits the set of observations.
The process of finding the best model out of various models is called optimization. It
can be achieved by below method:
1. R-squared method:
o R-squared is a statistical method that determines the goodness of fit.
o It measures the strength of the relationship between the dependent and
independent variables on a scale of 0-100%.
o The high value of R-square determines the less difference between the
predicted values and actual values and hence represents a good model.
o It is also called a coefficient of determination, or coefficient of multiple
determination for multiple regression.
o It can be calculated from the below formula:

Assumptions of Linear Regression


Below are some important assumptions of Linear Regression. These are some formal
checks while building a Linear Regression model, which ensures to get the best
possible result from the given dataset.
o Linear relationship between the features and target:
Linear regression assumes the linear relationship between the dependent and
independent variables.
o Small or no multicollinearity between the features:
Multicollinearity means high-correlation between the independent variables.
Due to multicollinearity, it may difficult to find the true relationship between
the predictors and target variables. Or we can say, it is difficult to determine
which predictor variable is affecting the target variable and which is not. So, the
model assumes either little or no multicollinearity between the features or
independent variables.
o Homoscedasticity Assumption:
Homoscedasticity is a situation when the error term is the same for all the
values of independent variables. With homoscedasticity, there should be no
clear pattern distribution of data in the scatter plot.
o Normal distribution of error terms:
Linear regression assumes that the error term should follow the normal
distribution pattern. If error terms are not normally distributed, then
confidence intervals will become either too wide or too narrow, which may
cause difficulties in finding coefficients.
It can be checked using the q-q plot. If the plot shows a straight line without
any deviation, which means the error is normally distributed.
o No autocorrelations:
The linear regression model assumes no autocorrelation in error terms. If there
will be any correlation in the error term, then it will drastically reduce the
accuracy of the model. Autocorrelation usually occurs if there is a dependency
between residual errors.
Advantages And Disadvantages:

Advantages Disadvantages
Linear regression performs
The assumption of linearity between
exceptionally well for linearly
dependent and independent variables
separable data
Easier to implement, interpret and
It is often quite prone to noise and over fitting
efficient to train
It handles overfitting pretty well using
dimensionally reduction techniques, Linear regression is quite sensitive to outliers
regularization, and cross-validation
One more advantage is the
extrapolation beyond a specific data It is prone to multicollinearity
set
Chapter 3
Introduction to Natural Language Processing
Language is a method of communication with the help of which we can speak, read
and write. For example, we think, we make decisions, plans and more in natural
language; precisely, in words. However, the big question that confronts us in this AI
era is that can we communicate in a similar manner with computers. In other words,
can human beings communicate with computers in their natural language?

As early as the 1950s computer scientists began attempts at using software to process
and analyze textual components, sentiment, parts of speech, and the various entities
that make up a body of text. Until relatively recently, processing and analyzing
language has been quite a challenge.

Ever since IBM’s Watson won on the game show Jeopardy! , the promise of machines
being able to understand language has slowly edged closer.

Natural language processing is essentially the ability to take a body of text and extract
meaning from it using a computer.

It is a challenge for us to develop NLP applications because computers need


structured data, but human speech is unstructured and often ambiguous in nature.
While computational language is very structured (think XML or JSON) and easily
understood by a machine, written words by humans are quite messy and
unstructured—meaning when you write about a house, friend, pet, or a phone in a
paragraph, there’s no explicit reference that labels each of them as such.
In this sense, we can say that Natural Language Processing (NLP) is the sub-field of
Computer Science especially Artificial Intelligence (AI) that is concerned about
enabling computers to understand and process human language. Technically, the
main task of NLP would be to program computers for analyzing and processing huge
amount of natural language data.

For example, take this simple sentence:

I drove my friend Mary to the park in my Tesla while listening to music on my IPhone.

For a human reader, this is an easily understandable sentence and paints a clear
picture of what’s happening. But for a computer, not so much. For a machine, the
sentence would need to be broken down into its structured parts. Instead of an entire
sentence, the computer would need to see both the individual parts and entities
along with the relations between these entities.

Humans understand that Mary is a friend and that a Tesla is likely a car. Since we have
the context of bringing our friend along with us, we intuitively rule out that we’re
driving something else, like a bicycle. Additionally, after many years of popularity and
cultural references, we all know that an iPhone is a smartphone.

None of the above is understood by a computer without assistance. Now let’s take a
look at how that sentence could be written as structured data from the outset. If
developers had made time in advance to structure the data in our sentence, in
XML you’d see the following entities:
<friend>Mary</friend>
<car>Tesla</car>
<phone>iPhone</phone>
But obviously, this can’t happen on the fly without assistance. As mentioned
previously, we have significantly more unstructured data than structured. And unless
time is taken to apply the correct structure to the text in advance, we have a massive
problem that needs solving. This is where NLP enters the picture.
Natural language processing is needed when you wish to mine unstructured data and
extract meaningful insight from text. General applications of NLP attempt to identify
common entities from a body of text; but when you start working with domain-
specific content, a custom model needs training.
The Components of NLP
In order to understand NLP, we first need to understand the components of its model.
Specifically, natural language processing lets you analyze and extract key metadata
from text, including entities, relations, concepts, sentiment, and emotion.
Let’s briefly discuss each of these aspects that can be extracted from a body of text.
Entities
Likely the most common use case for natural language processing, entities are the
people, places, organizations, and things in your text. In our initial example sentence,
we identified several entities in the text—friend, car, and phone.
Relations
How are entities related? Natural language processing can identify whether there is a
relationship between multiple entities and tell the type of relation between them. For
example, a “createdBy” relation might connect the entities “iPhone” and “Apple.”
Concepts
One of the more magical aspects of NLP is extracting general concepts from the body
of text that may not explicitly appear in the corpus. This is a potent tool. For example,
analysis of an article about Tesla may return the concepts “electric cars“or “Elon
Musk,” even if those terms are not explicitly mentioned in the text.
Keywords
NLP can identify the important and relevant keywords in your content. This allows you
to create a base of words from the corpus that are important to the business value
you’re trying to drive.
Semantic Roles
Semantic roles are the subjects, actions, and the objects they act upon in the text.
Take the sentence, “IBM bought a company.” In this sentence the subject is “IBM,”
the action is “bought,” and the object is “company.” NLP can parse sentences into
these semantic roles for a variety of business uses—for example, determining which
companies were acquired last week or receiving notifications any time a particular
company launches a product.
Categories
Categories describe what a piece of content is about at a high level. NLP can analyze
text and then place it into a hierarchical taxonomy, providing categories to use in
applications. Depending on the content, categories could be one or more of sports,
finance, travel, computing, and so on. Possible applications include placing relevant
ads alongside user-generated content on a website or displaying all the articles talking
about a particular subject.
Emotion
Whether you’re trying to understand the emotion conveyed by a post on social media
or analyze incoming customer support tickets, detecting emotions in text is extremely
valuable. Is the content conveying anger, disgust, fear, joy, or sadness? Emotion
detection in NLP will assist in solving this problem.
Sentiment
Similarly, what is the general sentiment in the content? Is it positive, neutral, or
negative? NLP can provide a score as to the level of positive or negative sentiment of
the text. Again, this proves to be extremely valuable in the context of customer
support. This enables automatic understanding of sentiment related to your product
on a continual basis. Now that we’ve covered what constitutes natural language
processing, let’s look at some examples to illustrate how NLP is currently being used
across various industries.
Enterprise Applications of NLP
The following are some of the best representations of the power of NLP.

Social Media Analysis


One of the most common enterprise applications of natural language processing is in
the area of social media monitoring, analytics, and analysis. Over 500 million tweets
are sent per day. How can we extract valuable insights from them? What are the
relevant trending topics and hashtags for a business? Natural language processing can
deliver this information and more by analyzing social media. Not only can sentiment
and mentions be mined across all this user generated social content, but specific
conversations can also be found to help companies better interact with customers.
Customer Support
A recent study has shown that companies lose more than $62 billion annually due to
poor customer service, a 51% increase since 2013. Therefore, there’s obviously a need
for ways to improve customer support.
Natural language processing can also assist in making sure support representatives
are both consistent as well as nonaggressive (or any other trait the company is
looking to minimize) in their language. When preparing a reply to a support question,
an application incorporated with NLP can provide a suggested vocabulary to assist this
process. These approaches to customer support can make the overall system much
faster, more efficient, and easier to maintain, and subsequently reduce costs over a
traditional ticketing system.
Business Intelligence
The inability to use unstructured data, both internal and external, for business
decision making is a critical problem. Natural language processing allows all users,
especially nontechnical experts, to ask questions of the data as opposed to needing to
write a complex query of the database. This allows the business users to ask questions
of the data without having to request developer resources to make it happen.
Content Marketing and Recommendation
As it becomes harder to reach customers with advertising, companies now look to
content marketing to produce unique stories that will drive traffic and increase brand
awareness. Not only do they look for new content to create, but companies also want
better ways to recommend more relevant content to their readers. Everyone is
familiar with being recommended articles that are merely click bait with little value or
applicability to your interests.
Some NLP applications
The following list is not complete, but useful systems have been built for:
• Spelling and grammar checking
• Optical character recognition (OCR)
• screen readers for blind and partially sighted users
• Augmentative and alternative communication (i.e., systems to aid people who have
difficulty communicating because of disability)
• machine aided translation (i.e., systems which help a human translator, e.g., by
storing translations of phrases and providing online dictionaries integrated with word
processors, etc)
• Lexicographers’ tools
• Information retrieval
• Document classification (filtering, routing)
• document clustering
• Information extraction
• question answering
• Summarization
• Text segmentation
• Exam marking
• Report generation (possibly multilingual)
• Machine translation
• Natural language interfaces to databases
• email understanding
• Dialogue systems
Natural language processing also forms the backbone for creating conversational
applications, more commonly known as chat bots.
NLP Phases
Following diagram shows the phases or logical steps in natural language processing:
Morphological Processing
It is the first phase of NLP. The purpose of this phase is to break chunks of language
input into sets of tokens corresponding to paragraphs, sentences and words. For
example, a word like “uneasy” can be broken into two sub-word tokens as “un-easy”.
Syntax Analysis
It is the second phase of NLP. The purpose of this phase is two folds: to check that a
sentence is well formed or not and to break it up into a structure that shows the
syntactic relationships between the different words. For example, the sentence like
“The school goes to the boy” would be rejected by syntax analyzer or parser.
Sentiment Analysis
Sentiment analysis (also known as opinion mining or emotion AI) refers to the use
of natural language processing, text analysis, computational linguistics,
and biometrics to systematically identify, extract, quantify, and study affective states
and subjective information.
Semantic Analysis
It is the third phase of NLP. The purpose of this phase is to draw exact meaning, or
you can say dictionary meaning from the text. The text is checked for meaningfulness.
For example, semantic analyzer would reject a sentence like “Hot ice-cream”.
Pragmatic Analysis
It is the fourth phase of NLP. Pragmatic analysis simply fits the actual objects/events,
which exist in a given context with object references obtained during the last phase
(semantic analysis). For example, the sentence “Put the banana in the basket on the
shelf” can have two semantic interpretations and pragmatic analyzer will choose
between these two possibilities.
Natural language Understanding:
Natural-language understanding (NLU) or natural-language interpretation (NLI) is a
subtopic of natural-language processing in artificial intelligence that deals with
machine reading comprehension. Natural-language understanding is considered
an AI-hard problem.

There is considerable commercial interest in the field because of its application


to automated reasoning, machine translation, question answering, news-
gathering, text categorization, voice-activation, archiving, and large-scale content
analysis.

Natural Language Understanding is a part of Natural Language Processing. It


undertakes the analysis of content, text-based metadata and generates summarized
content in natural, human language. It is opposite to the process of Natural Language
Generation. NLG deals with input in the form of data and generates output in the
form of plain text while Natural Language Understanding tools process text or voice
that is in natural language and generates appropriate responses by summarizing,
editing or creating vocal responses. Natural Language Processing is a wide term which
includes both Natural Language Understanding and Natural Language Generations
along with many other techniques revolving around translating and analyzing natural
language by machines to perform certain commands.   

Examples of Natural Language Processing

Natural Language Processing is everywhere and we use it in our daily lives without
even realizing it. Do you know how spam messages are separated from your emails?
Or autocorrect and predictive typing that saves so much of our time, how does that
happen? Well, it is all part of Natural Language Processing. Here are some examples
of Natural Language Processing technologies used widely:
 Intelligent personal assistants – We are all familiar with Siri and Cortana. These
mobile software products that perform tasks, offer services, with a combination of
user input, location awareness, and the ability to access information from a variety of
online sources are undoubtedly one of the biggest achievements of natural language
processing.
 Machine translation – To read a description of a beautiful picture on Instagram
or to read updates on Facebook, we all have used that ‘see translation’ command at
least once. And google translation services helps in urgent situations or sometimes
just to learn few new words. These are all examples of machine translations, where
machines provide us with translations from one natural language to another.
 Speech recognition – Converting spoken words into data is an example of
natural language processing. It is used for multiple purposes like dictating to Microsoft
Word, voice biometrics, voice user interface, etc.
 Affective computing – It is nothing but emotional intelligence training for
machines. They learn to understand your emotions, feelings, ideas to interact with
you in more humane ways.
 Natural language generation – Natural language generation tools scan
structured data, undertake analysis and generate information in text format produced
in natural language.
 Natural language understanding – As explained above, it scans content written
in natural languages and generates small, comprehensible summaries of text.
Best tools for Natural Language Understanding available today
Natural Language Processing deals with human language in its most natural form and
on a real-time basis, as it appears in social media content, emails, web pages, tweets,
product descriptions, newspaper articles, and scientific research papers, etc, in a
variety of languages. Businesses need to keep a tab on all this content, constantly.
Here are a few popular natural language understanding software products which
effectively aid them in this daunting task.
 Wolfram – Wolfram Alpha is an answer engine developed by Wolfram Alpha
LLC (a subsidiary of Wolfram Research). It is an online service that provides answers to
factual questions by computing the answer from externally sourced, “curated data”.
 Natural language toolkit – The Natural Language Toolkit, also known as NLTK, is
a suite of programs used for symbolic and statistical natural language processing
(NLP) for the English language. It is written in the Python programming language and
was developed by Steven Bird and Edward Loper at the University of Pennsylvania.
 Stanford coreNLP – Stanford CoreNLP is an annotation-based NLP pipeline that
offers core natural language analysis. The basic distribution provides model files for
the analysis of English, but the engine is compatible with models for other languages.
 GATE (General Architecture for Text Engineering) – It offers a wide range of
natural language processing tasks. It is a mature software used across industries for
more than 15 years.

 Apache open NLP – The Apache OpenNLP is a toolkit based on machine


learning to process natural language text. It is written in Java and is produced by
Apache software foundation. It offers services like tokenizers, chucking, parsing, part
of speech tagging, sentence segmentation, etc.

Sentiment Analysis:
Sentiment analysis (also known as opinion mining or emotion AI) refers to the use
of natural language processing, text analysis, computational linguistics,
and biometrics to systematically identify, extract, quantify, and study affective states
and subjective information. Sentiment analysis is widely applied to voice of the
customer materials such as reviews and survey responses, online and social media,
and healthcare materials for applications that range from marketing to customer
service to clinical medicine.

Examples:
The Restaurant is great, Staff are really friendly and food is delicious---Positive
(Sentiment)
I would not recommend this restaurant to anyone, food is terrible and is really
expensiveNegative (sentiment)
More Challenging Examples:
The movie is surprising with plenty of unsettling plot twists. (Negative term used in a
positive sense in certain domains).
I love my mobile but would not recommend it to any of my colleagues. (Qualified
positive sentiment, difficult to categories).
Sentiment Analysis is the process of determining whether a piece of writing is
positive, negative or neutral. A sentiment analysis system for text analysis combines
natural language processing (NLP) and machine learning techniques to assign
weighted sentiment scores to the entities, topics, themes and categories within a
sentence or phrase.
Why is Sentiment Analysis needed?

In today’s environment where we’re suffering from data overload (although this does
not mean better or deeper insights), companies might have mountains of customer
feedback collected. Yet for mere humans, it’s still impossible to analyze it manually
without any sort of error or bias.

Oftentimes, companies with the best intentions find themselves in an insights


vacuum. You know you need insights to inform your decision making. And you know
that you’re lacking them. But you don’t know how best to get them. Sentiment
analysis provides answers into what the most important issues are. Because
sentiment analysis can be automated, decisions can be made based on a significant
amount of data rather than plain intuition that isn’t always right.
How does Sentiment Analysis work?
The Basics

Basic sentiment analysis of text documents follows a straightforward process:

1. Break each text document down into its component parts (sentences, phrases,
tokens and parts of speech)
2. Identify each sentiment-bearing phrase and component
3. Assign a sentiment score to each phrase and component (-1 to +1)
4. Optional: Combine scores for multi-layered sentiment analysis
For Example
Terrible pitching and awful hitting led to another crushing loss.
Bad pitching and mediocre hitting cost us another close game.
Both sentences discuss a similar subject, the loss of a baseball game. But you, the
human reading them, can clearly see that first sentence’s tone is much more negative.
When you read the sentences above, your brain draws on your accumulated
knowledge to identify each sentiment-bearing phrase and interpret their negativity or
positivity. Usually this happens subconsciously. For example, you instinctively know
that a game that ends in a “crushing loss” has a higher score differential than the
“close game”, because you understand that “crushing” is a stronger adjective than
“close”.
Two basic techniques for sentiment analysis
1. Rule based sentiment analysis
Usually, a rule-based system uses a set of human-crafted rules to help identify
subjectivity, polarity, or the subject of an opinion.

These rules may include various techniques developed in computational linguistics,


such as:

 Stemming, tokenization, part-of-speech tagging and parsing.
 Lexicons (i.e. lists of words and expressions).
Here’s a basic example of how a rule-based system works:

1. Defines two lists of polarized words (e.g. negative words such


as bad, worst, ugly, etc and positive words such as good, best, beautiful, etc).
2. Counts the number of positive and negative words that appear in a given text.

3. If the number of positive word appearances is greater than the number of


negative word appearances, the system returns a positive sentiment, and vice
versa. If the numbers are even, the system will return a neutral sentiment.

Rule-based systems are very naive since they don't take into account how words are
combined in a sequence. Of course, more advanced processing techniques can be
used, and new rules added to support new expressions and vocabulary. However,
adding new rules may affect previous results, and the whole system can get very
complex. Since rule-based systems often require fine-tuning and maintenance, they’ll
also need regular investments.

It uses a dictionary of words labelled by sentiment to determine the sentiment of a


sentence. Sentiment scores typically need to be combined with additional rules to
mitigate sentences containing negations, sarcasm, or dependent clauses.

Sentiment Dictionary Example:

-1 = Negative / +1 = Positive

2. Machine Learning Based Sentiment Analysis

Here, we train an ML model to recognize the sentiment based on the words and their
order using a sentiment-labelled training set. This approach depends largely on the
type of algorithm and the quality of the training data used.

Let’s look again at the stock trading example mentioned above. We take news
headlines, and narrow them to lines which mention the particular company that we
are interested in (often done by another NLP technique, called Named Entity
Recognition) and then gauge the polarity of the sentiment in the text.
One way to make this approach fit other types of problems is to measure polarity
across other dimensions. You could look at specific emotions. How angry was the
person when they were writing the text? How much fear is conveyed in the text?

Applications

Broadly speaking, sentiment analysis is most effective when used as a tool for Voice of
Customer and Voice of Employee. Business analysts, product managers, customer
support directors, human resources and workforce analysts, and other stakeholders
use sentiment analysis to understand how customers and employees feel about
particular subjects, and why they feel that way.

Segmentation and recognition:

SEGMENTATION

Text segmentation is the process of dividing written text into meaningful units, such
as words, sentences, or topics. The term applies both to mental processes used by
humans when reading text, and to artificial processes implemented in computers,
which are the subject of natural language processing.

Speech segmentation is the process of identifying the boundaries


between words, syllables, or phonemes in spoken natural languages. The term applies
both to the mental processes used by humans, and to artificial processes of natural
language processing.
RECOGNITION

Named entity recognition (NER), also known as entity chunking/extraction , is a


popular technique used in information extraction to identify and segment the named
entities and classify or categorize them under various predefined classes.

In any text document, there are particular terms that represent specific entities that
are more informative and have a unique context. These entities are known as named
entities, which more specifically refer to terms that represent real-world objects like
people, places, organizations, and so on, which are often denoted by proper names. A
naive approach could be to find these by looking at the noun phrases in text
documents. Named entity recognition (NER) , also known as entity
chunking/extraction , is a popular technique used in information extraction to identify
and segment the named entities and classify or categorize them under various
predefined classes.

Speech recognition is an interdisciplinary subfield of computer


science and computational linguistics that develops methodologies and technologies
that enable the recognition and translation of spoken language into text by
computers. It is also known as automatic speech recognition (ASR), computer speech
recognition or speech to text (STT). It incorporates knowledge and research in
the computer science, linguistics and computer engineering fields.
Chapter 4
Introduction to Speech Recognition & Synthesis

Speech Fundamentals:
Developing and understanding Automatic Speech Recognition (ASR) systems is
an inter-disciplinary activity, taking expertise in linguistics, computer science,
mathematics, and electrical engineering.

When a human speaks a word, they cause their voice to make a time-varying
pattern of sounds. These sounds are waves of pressure that propagate through the
air. The sounds are captured by a sensor, such as a microphone or microphone array,
and turned into a sequence of numbers representing the pressure change over time.
The automatic speech recognition system converts this time-pressure signal into a
time-frequency-energy signal. It has been trained on a curated set of labeled speech
sounds, and labels the sounds it is presented with. These acoustic labels are combined
with a model of word pronunciation and a model of word sequences, to create a
textual representation of what was said.

The following definitions are the basics needed for understanding speech recognition
technology.

Utterance

An utterance is the vocalization (speaking) of a word or words that represent a


single meaning to the computer. Utterances can be a single word, a few words,
a sentence, or even multiple sentences.

Speaker Dependence

Speaker dependent systems are designed around a specific speaker. They


generally are more accurate for the correct speaker, but much less accurate for
other speakers. They assume the speaker will speak in a consistent voice and
tempo. Speaker independent systems are designed for a variety of speakers.
Adaptive systems usually start as speaker independent systems and utilize
training techniques to adapt to the speaker to increase their recognition
accuracy.

Vocabularies
Vocabularies (or dictionaries) are lists of words or utterances that can be
recognized by the SR system. Generally, smaller vocabularies are easier for a
computer to recognize, while larger vocabularies are more difficult. Unlike
normal dictionaries, each entry doesn't have to be a single word. They can be as
long as a sentence or two. Smaller vocabularies can have as few as 1 or 2
recognized utterances (e.g."Wake Up"), while very large vocabularies can have
a hundred thousand or more!

Accuract

The ability of a recognizer can be examined by measuring its accuracy - or how


well it recognizes utterances. This includes not only correctly identifying an
utterance but also identifying if the spoken utterance is not in its vocabulary.
Good ASR systems have an accuracy of 98% or more! The acceptable accuracy
of a system really depends on the application.

Training

Some speech recognizers have the ability to adapt to a speaker. When the
system has this ability, it may allow training to take place. An ASR system is
trained by having the speaker repeat standard or common phrases and
adjusting its comparison algorithms to match that particular speaker. Training a
recognizer usually improves its accuracy.

Training can also be used by speakers that have difficulty speaking, or


pronouncing certain words. As long as the speaker can consistently repeat an
utterance, ASR systems with training should be able to adapt.

Speech Analysis:
Speech analytics is the process of analyzing recorded calls to gather customer
information to improve communication and future interaction. The process is
primarily used by customer contact centers to extract information buried in client
interactions with an enterprise.

Although speech analytics includes elements of automatic speech recognition,


it is known for analyzing the topic being discussed, which is weighed against the
emotional character of the speech and the amount and locations of speech versus
non-speech during the interaction.

Speech analytics in contact centers can be used to mine recorded customer


interactions to surface the intelligence essential for building effective cost
containment and customer service strategies. The technology can pinpoint cost
drivers, trend analysis, identify strengths and weaknesses with processes and
products, and help understand how the marketplace perceives offerings

Speech Modelling:
Speech recognition works using algorithms through some model those are:

 Acoustic Modeling
 Language modeling
 Hidden Markov Models

Acoustic Modeling:

Acoustic modeling represents the relationship between linguistic units of


speech and audio signals.

An acoustic model is a file that contains statistical representations of each of


the distinct sounds that makes up a word.  Each of these statistical representations is
assigned a label called a phoneme. The English language has about 40 distinct sounds
that are useful for speech recognition, and thus we have 40 different phonemes.

An acoustic model is created by taking a large database of speech (called


a speech corpus) and using special training algorithms to create statistical
representations for each phoneme in a language.  These statistical representations
are called Hidden Markov Models ("HMM"s).  Each phoneme has its own HMM.

For example, if the system is set up with a simple grammar file to recognize the


word "house" (whose phonemes are: "hh aw s"), here are the (simplified) steps that
the speech recognition engine might take:
 The speech decoder listens for the distinct sounds spoken by a user and then
looks for a matching HMM in the Acoustic Model.  In our example, each of the
phonemes in the word house has its own HMM:
o  hh 
o  aw
o  s 
 When it finds a matching HMM in the acoustic model, the decoder takes note
of the phoneme. The decoder keeps track of the matching phonemes until it
reaches a pause in the users speech.

 When a pause is reached, the decoder looks up the matching series of


phonemes it heard (i.e. "hh aw s") in its Pronunciation Dictionary to determine
which word was spoken.  In our example, one of the entries in the
pronunciation dictionary is HOUSE: 
o ...
o HOUSAND         [HOUSAND]       hh aw s ax n d
o HOUSDEN         [HOUSDEN]       hh aw s d ax n
o HOUSE           [HOUSE]         hh aw s
o HOUSE'S         [HOUSE'S]       hh aw s ix z
o HOUSEAL         [HOUSEAL]       hh aw s ax l
o HOUSEBOAT       [HOUSEBOAT]     hh aw s b ow t
o ...

 The decoder then looks in the Grammar file for a matching word or phrase. 
Since our grammar in this example only contains one word ("HOUSE"), it
returns the word "HOUSE" to the calling program.

Language modeling:

Language modeling matches sounds with word sequences to help distinguish


between words that sound similar.

Language models are used to constrain search in a decoder by limiting the


number of possible words that need to be considered at any one point in the search.
The consequence is faster execution and higher accuracy.

Language models constrain search either absolutely (by enumerating some


small subset of possible expansions) or probabilistically (by computing a likelihood for
each possible successor word). The former will usually have an associated grammar
this is compiled down into a graph; the latter will be trained from a corpus.

Statistical language models (SLMs) are good for free-form input, such as
dictation or spontaneous speech, where it's not practical or possible to a priori specify
all possible legal word sequences.

Trigram SLMs are probably the most common ones used in ASR and represent a
good balance between complexity and robust estimation. A trigram model encodes
the probability of a word (w3) given its immediate two-word history, ie p(w3 | w1
w2). In practice trigam models can be "backed-off" to bigram and unigram models,
allowing the decoder to emit any possible word sequence (provided that the acoustic
and lexical evidence is there).

Difference between a phone and a phoneme:


 A phoneme is the smallest structural unit that distinguishes meaning in a
language.  Phonemes are not the physical segments themselves, but are
cognitive abstractions or categorizations of them.
 On the other hand, phones refer to the instances of phonemes in the actual
utterances - i.e. the physical segments.
 For example (from this article):
 the words "madder" and "matter" obviously are composed of
distinct phonemes; however, in american english, both words are pronounced
almost identically, which means that their phones are the same, or at least very
close in the acoustic domain.

Hidden Markov Models:

Modern general-purpose speech recognition systems are based on Hidden


Markov Models. These are statistical models that output a sequence of symbols or
quantities. HMMs are used in speech recognition because a speech signal can be
viewed as a piecewise stationary signal or a short-time stationary signal. In a short
time-scale (e.g., 10 milliseconds), speech can be approximated as a stationary
process. Speech can be thought of as a Markov model for many stochastic purposes.
Speech Recognition:
Theme

Speech is produced by the passage of air through various obstructions and


routings of the human larynx, throat, mouth, tongue, lips, nose etc. It is emitted
as a series of pressure waves. To automatically convert these pressure waves
into written words, a series of operations is performed. These involve capturing
and representing the pressure waves in appropriate notations, creating feature
vectors to represent time-slices of the converted input, clustering and purifying
the input, matching the results against a library of known sound vectorized
waveforms, choosing the most likely series of letter-sounds, and then selecting
the most likely sequence of words.

Summary of contents
1. Speech Recognition Systems

Introduction
Early speech recognition systems tried to model the human articulatory
channel. They didn’t work. Since the 1970s, these systems have been trained
on example data rather than defined using rules. The transition was caused by
the success of the HEARSAY and HARPY systems at CMU.

Step 1: Speech
Speech is pressure waves, travelling through the air. Created by vibrations of
larynx, followed by openings or blockages en route to the outside. Vowels and
consonants.

Step 2: Internal representation

1.The basic pressure wave is full of noise and very context-sensitive, so it is very
difficult to work with. First, perform Fourier transform, to represent the wave
as a sum of waves at a range of frequencies, within a certain window. This is
shown in the speech spectrogram on the previous page. Now work in the
Frequency domain (on the y axis), not the waveform domain. Try various
windows to minimize edge effects, etc.
2.Decompose (deconvolve) the Fourier-transformed waves into a set of vectors,
by cepstral analysis. Chop up the timeline (x-axis) and the frequency space (y-
axis) to obtain ‘little squares’, from which you obtain the quantized vectors.
Now certain operations become simpler (like working in log space; can add
instead of multiply), though some new steps become necessary. Move a
window over the Fourier transform and measure the strengths of the voice
natural frequencies f0, f1, f2…

Step 3: Purify and match: Acoustic Model


1.After quantizing the frequency vectors, represent them in an abstract vector
space (axes: time and MFCC cepstral coefficients). The resulting vectors, one
per timeslice, provide a picture of the incoming sound wave. Depending on
window size, speech inconsistencies, noise, etc., these vectors are not pure
reflections of the speaker’s sounds. So purify the vector series by clustering
them, using various algorithms, to find the major sound ‘bundles’. Here it’s
possible to merge vectors across time as well, to obtain durations.
2.Then match the bundles against a library of standard sound bundle ‘shapes’,
represented as durations vs. cepstral coeffecients. To save space, these
standard sound bundles are represented as mixtures of Gaussian curves (then
only need save two or three parameters per curve)—these are the contour
lines in the picture below, for one MFCC coefficient. Try to fit them over the
clustered points (like umbrellas). To find the best match (which vector
corresponds with which portion of the curve?), use the EM algorithm.
Step 4: Identify sound sequences and words: Lexical Model
Now you have a series of sounds, and you want a series of letters. But
unfortunately, sounds and letters do not line up one-to-one. So first represent
typical sound sequences in a Hidden Markov Model (like a Finite State
Network). For each sound, create all possible links to all other sounds, and
arrange these sounds into the HMM. Initialize everything with equal transition
probabilities. Then train the transition probabilities on the links using training
data, for which you know both the sounds and the correct letters.
Given a new input (= sound sequence), use the Viterbi algorithm to match
the incoming series of sounds to the best path through the HMM, taking
into account likely sound shifts, etc., as given by the probabilistic sound
transitions on the HMM arcs.

/b/0.3 /a/0.5

1 2 //0.4 3
//0.1
/t/0.6 /t/0.1

A typical large-vocabulary system takes into account context dependency


for the phonemes (so phonemes with different left and right context have
different realizations as HMM states); to do this it uses cepstral
normalization to normalize for different speaker and recording conditions.
One can do additional speaker normalization using vocal tract length
normalization (VTLN) for male-female normalization and maximum
likelihood linear regression (MLLR) for more general speaker adaptation.

Step 5: Sentences: Language Model


Finally, you have a series of words. But do they form a sentence? At this
point you could use a parser and a grammar to see. Trouble is, the
system’s top word selections often include errors; perhaps the second- or
the third-best words are the correct ones. So create an n-gram language
model (in practice, a trigram), for the domain you are working in. This
language model provides a probabilistic model of the worsd sequences
you are likely to encounter. Now match the incoming word sequence
(and all its most likely alternatives) against the n-gram language model,
using the Viterbi algorithm, and find what is indeed the most likely
sensible sentence.
Output the resulting sequence of words.
Overall System Architecture The typical components/architecture of an

ASR system:

2.Evaluation
Measures
Principal measure is Word Error rate (WER): measure how many words
were recognized correctly in known test sample.
WER = (S + I + D) * 100 / N

where N is the total number of words in the test set, and

S, I, and D are the total number of substitutions, insertions, and


deletions needed to convert the system’s output string into the
test string.

WER tends to drop by a factor of 2 every 2 years; in nicely controlled


setting in the lab, with limited vocabularies, systems do quite well (WER in
the low single digits). But in real life, which is noisy and unpredictable,
where people use made-up words and odd word mixtures, it’s a different
story.
In dialogue systems, people use Command Success Rate (CSR), in which
the dialogue engine and task help guide speech recognition; now measure
the success for each individual command, and for each task as a whole.
Performance

Corpus Speech Lexicon Word Error Human


Type Size Rate (%) Error
Rate (%)
connected digits Spontane 10 0.3 0.009
ous
resource mangmnt Read 1000 3.6 0.1
air travel agent Spontane 2000 2 –
ous
Wall Street Journal Read 64000 7 1
radio news Mixed 64000 27 –
tel. switchboard Conversa 10000 38 4
tion
telephone call Conversa 10000 50 –
home tion

1975–1985 1985–1997 present


pre-1975

Unit recognized sub-word; single sub-word sub-word sub-word


word
bigrams,
Unit of analysis single word fixed dialogue turns
trigrams
phrases
heuristic, ad hoc template
matching mathemati mathematical
Approaches to cal
modeling data-driven;
rule-based, data- data-driven; probabilistic;
declarative driven; probabilisti simple task
determin c analysis
istic
Knowledge
Heterogeneous homogen homogene unclear
representation
eous ous
embedd learning +
Knowledge intense manual automa
ed in manual effort
acquisition effort tic
simple for dialogues
structur learning
es

3.Speech translation
The goal: a translating telephone.
Research projects at CMU, Karlsruhe, ISI, ARL, etc.
The Verbmobil project in Germany translated between German and
French using English as interlingua. A large multi-site consortium, it kept
NLP funded in Germany for almost a decade, starting mid-1990s.
One commercial product (PC based): NEC, for $300, in 2002. They now sell
a PDA based version. 50,000 words, bigrams, some parsing and a little
semantic transfer.

The US Army Phrasalator: English-Arabic-English in a very robust box (able


to withstand desert fighting conditions): speech recognition and phrasal
(table lookup) translation and output.

4.Prosody
An increasingly interesting topic today is the recognition of emotion and
other pragmatic signals in addition to the words. Human-human speech
is foundationally mediated by prosody prosody (rhythm, intonation, etc.,
of speech). Speech is only natural when it is not ‘flat’: we infer a great
deal about speaker’s inner state and goals from prosody.
Prosody is characterized by two attributes:
 Prominence: Intonation, rhythm, and lexical stress patterns, which
signal emphasis, intent, emotion
 Phrasing: Chunking of utterances into prosodic phrases, which
assists with correct interpretation The sentence “he leaves
tomorrow”, said four ways (statement, question, command,
sarcastically):
To handle prosody, you need to develop:
 Suitable representation of prosody
 Algorithms to automatically detect prosody
 Methods to integrate these detectors in speech applications
To represent prosody, you extract features from the pitch contours of last
200 msec of utterances and then convert the parameters into a
discretized (categorical) notation.

Shri Narayanan and students in the EE Department at USC, and others


elsewhere, are detecting three
features: pitch (‘height’ of voice), intensity (loudness), and breaks (inter-
word spaces). They use the ToBI (TOnes and Break Indices)
representation. Procedure: To find the best sequence of prosody labels L
 They assign a prosodic label to each word conditioned on
contextual features
 They train continuous density Hidden Markov Models (HMMs) to
represent pitch accent and boundary tone — 3 states
They use the following kinds of features:
 Lexical features: Orthographic word identity
 Syntactic features: POS tags, Supertags (Similar to shallow syntactic
parse)
 Acoustic features: f0 and energy extracted over 10msec frames

5.Current status
Applications:
1.General-purpose dictation: Several commercial systems for $100:
 DragonDictate (used to be Dragon Systems; by Jim Baker); now at
Nuance (www.nuance.com)
 IBM ViaVoice (from Jim Baker at IBM)
 Whatever was left when Lernout and Hauspie (was Kurzweil) went
bankrupt
 Kai-Fu Lee takes SPHINX from CMU to Apple (PlainTalk) and then to
Microsoft Beijing
 Windows Speech Recognition in Windows Vista
2.Military: Handheld devices for speech-to-speech translation in Iraq
and elsewhere. Also used in fighter planes where the pilot’s hands are
too busy to type.
3.Healthcare: ASR for doctors in order to create patient records
automatically.
4.Autos: Speech devices take driver input and display routes, maps, etc.
5.Help for disabled (esp to access the web and control the computer).

Some research projects:


DARPA: ATIS Travel Agent (early 1990s); GALE
program (mid-2000s) MIT: GALAXY global weather,
restaurants, etc.
Dutch Railways: train information by phone
DARPA: COMMUNICATOR travel dialogues; BABYLON handheld
translation devices

Current topics: Interfaces


Speech systems in human-computer interfaces.

Problems for ASR


 Voices differ (men, women, children)
 Accents
 Speaking speed (overall, specific cadences)
 Pitch variation (high, low)
 Word and sentence boundaries
 Background noise — BIG PROBLEM
 Genuine ambiguity: “recognize speech” vs. “wreck a nice beach”

Speech Synthesis:
Traditional model

Lexicon of sounds for letters Problems: flat

Enhance: sentence prosody contour. Also need Speech Act and focus/stress as
input

Concatenative synthesis

Record speaker many times; create lexicon of sounds for letters, in word
start/middle/end, sentence start/middle/end, stress/unstress, etc. forms

At run time, choose most fitting variant, depending on neighboring options


(intensity/loudness, speed, etc.) Problems: clean sound units for letters,
matching disfluencies

Optional Readings

Victor Zue’s course at MIT.

Jelinek. F. 1998. Statistical Methods for Speech Recognition. Rabiner, L. 1993.


Fundamentals of Speech Recognition.
Schroeder, M.R. 2004. Computer Speech (2nd ed).

Karat, C.-M., J. Vergo, and D. Nahamoo. 2007. Conversational Interface


Technologies. In A. Sears and J.A. Jacko (eds) The Human-Computer Interaction
Handbook: Fundamentals, Evolving Technologies, and Emerging Applications
(Human Factors and Ergonomics). Lawrence Erlbaum Associates Inc.

Text-to-Speech:
Text-to-speech (TTS) is a type of  assistive technology that reads digital text
aloud. It’s sometimes called “read aloud” technology.

With a click of a button or the touch of a finger, TTS can take words on a
computer or other digital device and convert them into audio. TTS is very
helpful for kids who struggle with reading. But it can also help kids with writing
and editing, and even focusing.

How Text-to-Speech Works

TTS works with nearly every personal digital device, including computers,


smartphones and tablets. All kinds of text files can be read aloud, including
Word and Pages documents. Even online web pages can be read aloud.

The voice in TTS is computer-generated, and reading speed can usually be sped
up or slowed down. Voice quality varies, but some voices sound human. There
are even computer-generated voices that sound like children speaking.

Many TTS tools highlight words as they are read aloud. This allows kids to see
text and hear it at the same time.

Some TTS tools also have a technology called optical character


recognition (OCR). OCR allows TTS tools to read text aloud from images. For
example, your child could take a photo of a street sign and have the words on
the sign turned into audio.

How Text-to-Speech Can Help Your Child

Print materials in the classroom—like books and handouts—can create


obstacles for kids with reading issues. That’s because some kids struggle
with decoding and understanding printed words on the page. Using digital text
with TTS helps remove these barriers.

Did you know that your child may be eligible for free digital text-to-speech
books? 

And since TTS lets kids both see and hear text when reading, it creates
a multisensory reading experience. Researchers have found that the
combination of seeing and hearing text when reading:

 Improves word recognition

 Increases the ability to pay attention and remember information while


reading

 Allows kids to focus on comprehension instead of sounding out words

 Increases kids’ staying power for reading assignments

 Helps kids recognize and fix errors in their own writing

Types of Text-to-Speech Tools

Depending on the device your child uses, there are many different TTS tools:

 Built-in text-to-speech: Many devices have built-in TTS tools. This


includes desktop and laptop computers, smartphones and digital tablets
and Chrome. Your child can use this TTS without purchasing special apps
or software.

 Web-based tools: Some websites have TTS tools on-site. For instance,


you can turn on our website’s “Reading Assist” tool, located in the lower
left corner of your screen, to have this webpage read aloud. Also, kids
with dyslexia may qualify for a free Bookshare account with digital
books that can be read with TTS. (Bookshare is a program of Understood
founding partner Benetech.)

 Text-to-speech apps: Kids can also download TTS apps on smartphones


and digital tablets. These apps often have special features like text
highlighting in different colors and OCR. Some examples include Voice
Dream Reader, Claro ScanPen and Office Lens.
 Chrome tools: Chrome is a relatively new platform with several TTS
tools. These include Read&Write for Google Chrome and Snap&Read
Universal. You can use these tools on a Chromebook or any computer
with the Chrome browser

 Text-to-speech software programs: There are also several literacy


software programs for desktop and laptop computers. In addition to
other reading and writing tools, many of these programs have TTS.
Examples include Kurzweil 3000, ClaroRead and Read&Write. Microsoft’s
Immersive Reader tool also has TTS. It can be found in programs like
OneNote and Word. How Your Child Can Access Text-to-Speech at
School

It’s a good idea to start the conversation with your child’s teacher if you think
your child would benefit from TTS. If your child has an Individualized Education
Program (IEP) or a 504 plan, your child has a right to the assistive technology
she needs to learn. But even without an IEP or a 504 plan, a school may be
willing to provide TTS if it can help your child.
Chapter 5
Introduction to Image Processing & Computer Vision

Introduction to Image processing:


Image processing is a method to perform some operations on an image, in
order to get an enhanced image or to extract some useful information from it.
It is a type of signal processing in which input is an image and output may be
image or characteristics/features associated with that image. Nowadays, image
processing is among rapidly growing technologies. It forms core research area
within engineering and computer science disciplines too.
Image processing basically includes the following three steps:

 Importing the image via image acquisition tools;


 Analysing and manipulating the image;
 Output in which result can be altered image or report that is based on
image analysis.

There are two types of methods used for image processing namely, analogue


and digital image processing.

An image is formed by two-dimensional analog and the digital signal that


contains color information arranged along x and y spatial axis.
The value of f(x,y) at any point is gives the pixel value at that point of an
image.

The full form of the pixel is "Picture Element." It is also known as "PEL." Pixel is
the smallest element of an image on a computer display, whether they are LCD
or CRT monitors. A screen is made up of a matrix of thousands or millions of
pixels. A pixel is represented with a dot or a square on a computer screen.
The above figure is an example of digital image that you are now viewing on
your computer screen. But actually , this image is nothing but a two
dimensional array of numbers ranging between 0 and 255.
The analog image processing is applied on analog signals and it processes only
two-dimensional signals. The images are manipulated by electrical signals. In
analog image processing, analog signals can be periodic or non-periodic.
Examples of analog images are television images, photographs, paintings, and
medical images etc.

A digital image processing is applied to digital images (a matrix of small pixels


and elements). For manipulating the images, there is a number of software and
algorithms that are applied to perform changes. Digital image processing is one
of the fastest growing industry which affects everyone's life. Examples of
digital images are color processing, image recognition, video processing, etc.

There are following differences between Analog Images Processing and Digital
Image Processing:

Analog Image Processing Digital Image Processing


The analog image processing is appliedThe digital image processing is applied
on analog signals and it processes only
to digital signals that work on analyzing
two-dimensional signals. and manipulating the images.
Analog signal is time-varying signals so
It improves the digital quality of the
the images formed under analog image image and intensity distribution is
processing get varied. perfect in it.
Analog image processing is a slower and
Digital image processing is a cheaper
costlier process. and fast image storage and retrieval
process.
Analog signal is a real-world but not It uses good image compression
good quality of images. techniques that reduce the amount of
data required and produce good quality
of images
It is generally continuous and not It uses an image segmentation
broken into tiny components. technique which is used to detect
discontinuity which occurs due to a
broken connection path.

Overlapping fields:
Machine/Computer vision

Machine vision or computer vision deals with developing a system in which


the input is an image and the output is some information. For example:
Developing a system that scans human face and opens any kind of lock. This
system would look something like this.

Computer graphics

Computer graphics deals with the formation of images from object models,
rather then the image is captured by some device. For example: Object
rendering. Generating an image from an object model. Such a system would
look something like this.

Artificial intelligence
Artificial intelligence is more or less the study of putting human intelligence
into machines. Artificial intelligence has many applications in image
processing. For example: developing computer aided diagnosis systems that
help doctors in interpreting images of X-ray , MRI e.t.c and then highlighting
conspicuous section to be examined by the doctor.

Applications of Digital Image Processing


Some of the major fields in which digital image processing is widely used are
mentioned below
 Image sharpening and restoration
 Medical field
 Remote sensing
 Transmission and encoding
 Machine/Robot vision
 Color processing
 Pattern recognition
 Video processing
 Microscopic Imaging
 Others

Image Noise:
Noise represents unwanted information which deteriorates image quality.
Noise is a random variation of image intensity and visible as grains in the
image. Noise means, pixels within the picture present different intensity values
rather than correct pixel values. Noise originates from the physical nature of
detection processes and has many specific forms and causes, Noise is defined
as a process (n) which affects the acquired image (f) and is not part of the
scene (initial signal-s), and so the noise model can be written as f(i, j) = s( i, j) +
n(i, j). Digital image noise may come from various sources. The acquisition
process for digital images converts optical signals into electrical signals and
then into digital signals and is one processes by which the noise is introduced
in digital images.
TYPES OF NOISE:
During image acquisition or transmission, several factors are responsible for
introducing noise in the image. Depending on the types of disturbance, the
noise can affect the image to different extent. Our main concern is to remove
certain kind of noise. So we have to first identify certain type of noise and
apply different algorithms to remove the noise. The common types of are:
1. Salt Pepper Noise
2. Poisson Noise
3. Gaussian Noise
4. Speckle Noise

1. Salt Pepper Noise:


Salt and pepper noise is an impulse type of noise. It is actually the
intensity spikes. This type of noise is coming due to errors in data
transmission. This noise occurs in the image because of sharp and
sudden changes of image signal. For images corrupted by salt and
pepper noise the noisy pixels can take only the maximum and the
minimum values in the dynamic range. It is found that an 8- bit image,
the typical value for pepper noise is 0 and for salt noise it is 255. The salt
and pepper noise is generally caused by malfunctioning of pixel
elements in the camera sensors, faulty memory locations or timing
errors in the digitization process.

2. Poisson Noise:
Poisson or shot photon noise is the noise that is caused when number
of photons sensed by the senor is not sufficient to provide detectable
statistical information. Shot noise exists because a phenomenon such as
light and electric current consists of the movement of discrete packets. Shot
noise may be dominated when the finite number of particles that carry
energy is sufficiently small so that uncertainties due to the Poisson
distribution, which describe the occurrence of independent random events,
are of significance. Magnitude of this noise increase with the average
magnitude of the current or intensity of the light.

3. Gaussian Noise:
Gaussian noise is evenly distributed over signal. This means that each
pixel in the noisy image is the sum of the true pixel value and a random
Gaussian distributed noise value. The noise is independent of intensity
of pixel value at each point. A special case is white Gaussian noise, in
which the values at any pair of times are identically distributed and
statistically independent. White noise draws its name from white light.
Principal sources of Gaussian noise in digital images arise during
acquisition, for example sensor noise caused by poor illumination or high
temperature or transmission.

4. Speckle Noise:
Speckle noise is multiplicative noise unlike the Gaussian and salt pepper
noise. This noise can be modeled by random vale multiplications with
pixel values of the image and can be expressed as
P = I + n * I Where P is the speckle noise distribution image, I is the input
image and n is the uniform noise image by mean o and variance v.
Speckle noise is commonly observed in radar sensing system, although it
may appear in any type of remotely sensed image utilizing coherent
radiation. Like the light from a laser, the waves emitted by active sensors
travel in phase and interact minimally on their way to the target area.
Reducing the effect of speckle noise permits both better discrimination
of scene targets and easier automatic image segmentation.

Removal of Noise from Image:


Image de-noising is very important task in image processing for the analysis of
images. One goal in image restoration is to remove the noise from the image in
such a way that the original image is discernible. In modern digital image
processing data de-noising is a well- known problem and it is the concern of
diverse application areas. Image de-noising is often used in the field of
photography or publishing where image was somehow degraded but needs to
be improved before it can be printed. When we have a model for the
degradation process, the inverse process can be applied to the image to
restore it back to the original form.
There are two types of noise removal approaches (i) linear filtering (ii)
nonlinear filtering.

Linear Filtering: Linear filters are used to remove certain types of noise. These
filters remove noise by convolving the original image with a mask that
represents a low-pass filter or smoothing operation. The output of a linear
operation due to the sum of two inputs is the same as performing the
operation on the inputs individually and then summing the results. These
filters also tend to blur the sharp edges, destroy the lines and other fine details
of the image. Linear methods are fast but they do not preserve the details of
the image.

Non-Linear Filtering: Non- linear filter is a filter whose output is not a linear
function of its inputs. Non-linear filters preserve the details of the image. Non-
linear filters have many applications, especially removal of certain types of
noise that are not additive. Non-linear filters are considerably harder to use
and design than linear ones.

Color Enhancement:
Image enhancement is the process of adjusting digital images so that the
results are more suitable for display or further image analysis. For example,
you can remove noise, sharpen, or brighten an image, making it easier
to identify key features.

Here are some useful examples and methods of image enhancement:

 Filtering with morphological operators


 Histogram equalization
 Noise removal using a Wiener filter
 Linear contrast adjustment
 Median filtering
 Unsharp mask filtering
 Contrast-limited adaptive histogram equalization (CLAHE)
 Decorrelation stretch
The following images illustrate a few of these examples:

Correcting nonuniform illumination with morphological operators.


Enhancing grayscale images with histogram equalization.

Deblurring images using a Wiener filter.

Image enhancement algorithms include deblurring, filtering, and contrast


methods.

Segmentation:
In digital image processing and computer vision, image segmentation is the
process of partitioning a digital image into multiple segments (sets of pixels,
also known as image objects). The goal of segmentation is to simplify and/or
change the representation of an image into something that is more meaningful
and easier to analyze.[1][2] Image segmentation is typically used to locate
objects and boundaries (lines, curves, etc.) in images. More precisely, image
segmentation is the process of assigning a label to every pixel in an image such
that pixels with the same label share certain characteristics.
The result of image segmentation is a set of segments that collectively cover
the entire image, or a set of contours extracted from the image (see edge
detection). Each of the pixels in a region are similar with respect to some
characteristic or computed property, such as color, intensity, or texture.
Adjacent regions are significantly different with respect to the same
characteristic(s).[1] When applied to a stack of images, typical in medical
imaging, the resulting contours after image segmentation can be used to
create 3D reconstructions with the help of interpolation algorithms
like marching cubes.
Some of the practical applications of image segmentation are:

 Content-based image retrieval[4]


 Machine vision
 Medical imaging,[5][6] including volume rendered images from computed
tomography and magnetic resonance imaging.
o Locate tumors and other pathologies[7][8]
o Measure tissue volumes
o Diagnosis, study of anatomical structure[9]
o Surgery planning
o Virtual surgery simulation
o Intra-surgery navigation
 Object detection[10]
o Pedestrian detection
o Face detection
o Brake light detection
o Locate objects in satellite images (roads, forests, crops, etc.)
 Recognition Tasks
o Face recognition
o Fingerprint recognition
o Iris recognition
 Traffic control systems
 Video surveillance
 Video object co-segmentation and action localization[11][12]
Examples

Autonomous Driving

When designing perception for autonomous vehicles, such as self-driving


cars, semantic segmentation is popularly used to help the system identify and
locate vehicles and other objects on the road.
Using semantic segmentation to associate each pixel of the image with a class
label (such as car, road, sky, pedestrian, or bike).

Image segmentation involves converting an image into a collection of regions


of pixels that are represented by a mask or a labeled image. By dividing an
image into segments, you can process only the important segments of the
image instead of processing the entire image. 

A common technique is to look for abrupt discontinuities in pixel values, which


typically indicate edges that define a region. 

Using thresholding to convert to a binary image to improve the legibility of the


text in an image.
Another common approach is to detect similarities in the regions of an image.
Some techniques that follow this approach are region growing, clustering, and
thresholding.

Segmenting regions based on color values, shapes, or texture.

A variety of other approaches to perform image segmentation have been


developed over the years using domain-specific knowledge to effectively solve
segmentation problems in specific application areas.

Edge Detection:
Edge detection includes a variety of mathematical methods that aim at
identifying points in a digital image at which the image brightness changes
sharply or, more formally, has discontinuities. The points at which image
brightness changes sharply are typically organized into a set of curved line
segments termed edges. The same problem of finding discontinuities in one-
dimensional signals is known as step detection and the problem of finding
signal discontinuities over time is known as change detection. Edge detection is
a fundamental tool in image processing, machine vision and computer vision,
particularly in the areas of feature detection and feature extraction.

Motivation:
The purpose of detecting sharp changes in image brightness is to capture
important events and changes in properties of the world. It can be shown that
under rather general assumptions for an image formation model,
discontinuities in image brightness are likely to correspond to:[2][3]

 discontinuities in depth,
 discontinuities in surface orientation,
 changes in material properties and
 variations in scene illumination.
In the ideal case, the result of applying an edge detector to an image may lead
to a set of connected curves that indicate the boundaries of objects, the
boundaries of surface markings as well as curves that correspond to
discontinuities in surface orientation. Thus, applying an edge detection
algorithm to an image may significantly reduce the amount of data to be
processed and may therefore filter out information that may be regarded as
less relevant, while preserving the important structural properties of an image.
If the edge detection step is successful, the subsequent task of interpreting the
information contents in the original image may therefore be substantially
simplified. However, it is not always possible to obtain such ideal edges from
real life images of moderate complexity.
Edges extracted from non-trivial images are often hampered by fragmentation,
meaning that the edge curves are not connected, missing edge segments as
well as false edges not corresponding to interesting phenomena in the image –
thus complicating the subsequent task of interpreting the image data.[4]
Edge detection is one of the fundamental steps in image processing, image
analysis, image pattern recognition, and computer vision techniques.

Optical Character Recognition:


Optical character recognition is usually abbreviated as OCR. It includes the
mechanical and electrical conversion of scanned images of handwritten,
typewritten text into machine text. It is common method of digitizing printed
texts so that they can be electronically searched, stored more compactly,
displayed on line, and used in machine processes such as machine translation,
text to speech and text mining.
In recent years, OCR (Optical Character Recognition) technology has been
applied throughout the entire spectrum of industries, revolutionizing the
document management process. OCR has enabled scanned documents to
become more than just image files, turning into fully searchable documents
with text content that is recognized by computers. With the help of OCR,
people no longer need to manually retype important documents when
entering them into electronic databases. Instead, OCR extracts relevant
information and enters it automatically. The result is accurate, efficient
information processing in less time.
Optical character recognition has multiple research areas but the most
common areas are as following:
Banking
We uses of OCR vary across different fields. One widely known application is
in banking, where OCR is used to process checks without human involvement.
A check can be inserted into a machine, the writing on it is scanned instantly,
and the correct amount of money is transferred. This technology has nearly
been perfected for printed checks, and is fairly accurate for handwritten
checks as well, though it occasionally requires manual confirmation. Overall,
this reduces wait times in many banks.
Blind and visually impaired persons
One of the major factors in the beginning of research behind the OCR is that
scientist want to make a computer or device which could read book to the
blind people out loud. On this research scientist made flatbed scanner which
is most commonly known to us as document scanner.
Legal department
In the legal industry, there has also been a significant movement to digitize
paper documents. In order to save space and eliminate the need to sift
through boxes of paper files, documents are being scanned and entered into
computer databases. OCR further simplifies the process by making documents
text-searchable, so that they are easier to locate and work with once in the
database. Legal professionals now have fast, easy access to a huge library of
documents in electronic format, which they can find simply by typing in a few
keywords.
Retail Industry
Barcode recognition technology is also related to OCR. We see the use of this
technology in our common day use.
Other Uses
OCR is widely used in many other fields, including education, finance, and
government agencies. OCR has made countless texts available online, saving
money for students and allowing knowledge to be shared. Invoice imaging
applications are used in many businesses to keep track of financial records
and prevent a backlog of payments from piling up. In government agencies
and independent organizations, OCR simplifies data collection and analysis,
among other processes. As the technology continues to develop, more and
more applications are found for OCR technology, including increased use of
handwriting recognition.

Feature Detection:
In computer vision and image processing feature detection includes methods
for computing abstractions of image information and making local decisions at
every image point whether there is an image feature of a given type at that
point or not. The resulting features will be subsets of the image domain, often
in the form of isolated points, continuous curves or connected regions.
Definition of Feature:
a feature is typically defined as an "interesting" part of an image, and features
are used as a starting point for many computer vision algorithms.

Types of Image Features:


1. Edges
Edges are points where there is a boundary (or an edge) between two image
regions. In general, an edge can be of almost arbitrary shape, and may include
junctions. In practice, edges are usually defined as sets of points in the image
which have a strong gradient magnitude. Furthermore, some common
algorithms will then chain high gradient points together to form a more
complete description of an edge. These algorithms usually place some
constraints on the properties of an edge, such as shape, smoothness, and
gradient value.
Locally, edges have a one-dimensional structure.

2. Corners / interest points


The terms corners and interest points are used somewhat interchangeably and
refer to point-like features in an image, which have a local two dimensional
structure. The name "Corner" arose since early algorithms first
performed edge detection, and then analysed the edges to find rapid changes
in direction (corners).
3. Blobs / regions of interest points
Blobs provide a complementary description of image structures in terms of
regions, as opposed to corners that are more point-like. Nevertheless, blob
descriptors may often contain a preferred point (a local maximum of an
operator response or a center of gravity) which means that many blob
detectors may also be regarded as interest point operators. Blob detectors can
detect areas in an image which are too smooth to be detected by a corner
detector.
Main Component of Feature Detection

 Detection: Identify the Interest Point

 Description: The local appearance around each feature point is described


in some way that is (ideally) invariant under changes in illumination,
translation, scale, and in-plane rotation. We typically end up with a
descriptor vector for each feature point.

Recognition:
Image recognition, in the context of machine vision, is the ability of software
to identify objects, places, people, writing and actions in images. Computers
can use machine vision technologies in combination with a camera
and artificial intelligence software to achieve image recognition.

Image recognition is used to perform a large number of machine-based visual


tasks, such as labeling the content of images with meta-tags, performing image
content search and guiding autonomous robots, self-driving cars and accident
avoidance systems.

While human and animal brains recognize objects with ease, computers have
difficulty with the task. Software for image recognition requires deep machine
learning. Performance is best on convolutional neural net processors as the
specific task otherwise requires massive amounts of power for its compute-
intensive nature. Image recognition algorithms can function by use of
comparative 3D models, appearances from different angles using edge
detection or by components. Image recognition algorithms are often trained
on millions of pre-labeled pictures with guided computer learning.

Current and future applications of image recognition include smart photo


libraries, targeted advertising, the interactivity of media, accessibility for the
visually impaired and enhanced research
capabilities. Google, Facebook, Microsoft, Apple and Pinterest are among the
many companies that are investing significant resources and research into
image recognition and related applications. Privacy concerns over image
recognition and similar technologies are controversial as these companies can
pull a large volume of data from user photos uploaded to their social media
platforms.

You might also like