CI
CI
1. Mini-Max Algorithm
The working of the minimax algorithm can be easily described using an example. Below we have
taken an example of game-tree which is representing the two-player game.
In this example, there are two players one is called Maximizer and other is called Minimizer.
Maximizer will try to get the Maximum possible score, and Minimizer will try to get the
minimum possible score.
This algorithm applies DFS, so in this game-tree, we have to go all the way through the leaves to
reach the terminal nodes.
At the terminal node, the terminal values are given so we will compare those value and
backtrack the tree until the initial state occurs. Following are the main steps involved in solving
the two-player game tree:
Complete- Min-Max algorithm is Complete. It will definitely find a solution (if exist), in the finite
search tree.
Optimal- Min-Max algorithm is optimal if both opponents are playing optimally.
Time complexity- As it performs DFS for the game-tree, so the time complexity of Min-Max
algorithm is O(bm), where b is branching factor of the game-tree, and m is the maximum depth
of the tree.
Space Complexity- Space complexity of Mini-max algorithm is also similar to DFS which is
O(bm).
2. Alpha-Beta Pruning
Note: To better understand this topic, kindly study the minimax algorithm.
α>=β
An expert system is a computer program that is designed to solve complex problems and to provide
decision-making ability like a human expert. It performs this by extracting knowledge from its
knowledge base using the reasoning and inference rules according to the user queries.
The performance of an expert system is based on the expert's knowledge stored in its knowledge base.
The more knowledge stored in the KB, the more that system improves its performance.
High Performance: The expert system provides high performance for solving any type of
complex problem of a specific domain with high efficiency and accuracy.
Understandable: It responds in a way that can be easily understandable by the user. It can take
input in human language and provides the output in the same way.
Reliable: It is much reliable for generating an efficient and accurate output.
Highly responsive: ES provides the result for any complex query within a very short period of
time.
User Interface
Inference Engine
Knowledge Base
1. User Interface
With the help of a user interface, the expert system interacts with the user, takes queries as an input in
a readable format, and passes it to the inference engine. After getting the response from the inference
engine, it displays the output to the user. In other words, it is an interface that helps a non-expert user
to communicate with the expert system to find a solution.
2. Inference Engine(Rules of Engine)
The inference engine is known as the brain of the expert system as it is the main processing unit
of the system. It applies inference rules to the knowledge base to derive a conclusion or deduce
new information. It helps in deriving an error-free solution of queries asked by the user.
With the help of an inference engine, the system extracts the knowledge from the knowledge
base.
There are two types of inference engine:
Deterministic Inference engine: The conclusions drawn from this type of inference engine are
assumed to be true. It is based on facts and rules.
Probabilistic Inference engine: This type of inference engine contains uncertainty in
conclusions, and based on the probability.
Forward Chaining: It starts from the known facts and rules, and applies the inference rules to
add their conclusion to the known facts.
Backward Chaining: It is a backward reasoning method that starts from the goal and works
backward to prove the known facts.
3. Knowledge Base
The knowledgebase is a type of storage that stores knowledge acquired from the different
experts of the particular domain. It is considered as big storage of knowledge. The more the
knowledge base, the more precise will be the Expert System.
It is similar to a database that contains information and rules of a particular domain or subject.
One can also view the knowledge base as collections of objects and their attributes. Such as a
Lion is an object and its attributes are it is a mammal, it is not a domestic animal, etc.
Factual Knowledge: The knowledge which is based on facts and accepted by knowledge
engineers comes under factual knowledge.
Heuristic Knowledge: This knowledge is based on practice, the ability to guess, evaluation, and
experiences.
Knowledge Representation: It is used to formalize the knowledge stored in the knowledge base using
the If-else rules.
Knowledge Acquisitions: It is the process of extracting, organizing, and structuring the domain
knowledge, specifying the rules to acquire the knowledge from various experts, and store that
knowledge into the knowledge base.
1. No memory Limitations: It can store as much data as required and can memorize it at the time
of its application. But for human experts, there are some limitations to memorize all things at
every time.
2. High Efficiency: If the knowledge base is updated with the correct knowledge, then it provides a
highly efficient output, which may not be possible for a human.
3. Expertise in a domain: There are lots of human experts in each domain, and they all have
different skills, different experiences, and different skills, so it is not easy to get a final output for
the query. But if we put the knowledge gained from human experts into the expert system, then
it provides an efficient output by mixing all the facts and knowledge
4. Not affected by emotions: These systems are not affected by human emotions such as fatigue,
anger, depression, anxiety, etc.. Hence the performance remains constant.
5. High security: These systems provide high security to resolve any query.
6. Considers all the facts: To respond to any query, it checks and considers all the available facts
and provides the result accordingly. But it is possible that a human expert may not consider
some facts due to any reason.
7. Regular updates improve the performance: If there is an issue in the result provided by the
expert systems, we can improve the performance of the system by updating the knowledge
base.
Advising: It is capable of advising the human being for the query of any domain from the
particular ES.
Provide decision-making capabilities: It provides the capability of decision making in any
domain, such as for making any financial decision, decisions in medical science, etc.
Demonstrate a device: It is capable of demonstrating any new products such as its features,
specifications, how to use that product, etc.
Problem-solving: It has problem-solving capabilities.
Explaining a problem: It is also capable of providing a detailed description of an input problem.
Interpreting the input: It is capable of interpreting the input given by the user.
Predicting results: It can be used for the prediction of a result.
Diagnosis: An ES designed for the medical field is capable of diagnosing a disease without using
multiple components as it already contains various inbuilt medical tools.
The response of the expert system may get wrong if the knowledge base contains the wrong
information.
Like a human being, it cannot produce a creative output for different scenarios.
Its maintenance and development costs are very high.
Knowledge acquisition for designing is much difficult.
For each domain, we require a specific ES, which is one of the big limitations.
It cannot learn from itself and hence requires manual updates.
Backward Chaining
In logic programming and artificial intelligence, the technique of "backward chaining" is used to get from
the objective to the assumptions or circumstances that support it.
Backward chaining starts with a hypothesis or objective and works backward through a set of
circumstances or rules to see if the goal is supported by those conditions. The system verifies each
requirement until it reaches a point where all requirements are met or until it reaches a requirement
that cannot be met, at which time the system terminates and communicates the outcome.
Backward chaining, for instance, could be employed in a medical diagnosis system to identify the
primary reason behind a group of symptoms. In order to identify the diseases or disorders that might be
producing such symptoms, the system starts with the symptoms as the goal and works backward
through a series of criteria and conditions.
Restricted reasoning − Backward chaining only works in one direction and might not be able to
produce fresh insights or solutions that weren't specifically coded into the system.
Incomplete search − Backward chaining occasionally generates partial findings or fails to fully
investigate all potential solutions.
Handling conflicts − Conflict resolution may be challenging when using backward chaining to
reconcile inconsistencies or conflicts between several laws or facts.
Forward Chaining
By starting with the premises or conditions and applying them one at a time to arrive at a conclusion,
forward chaining is a reasoning technique used in artificial intelligence and logic programming.
By applying a set of rules to an initial set of facts or circumstances, the system can then generate new
facts or conditions. This process is known as forward chaining. The system keeps using these rules and
producing new facts until it reaches a conclusion or a goal.
For instance, forward chaining might be employed in a rule-based system for diagnosing automobile
issues to identify a specific problem with the vehicle. Starting with observations of the car's behaviour,
the system would employ a set of rules to create potential reasons of the issue. As it narrows the
options and keeps applying the rules to rule out unlikely explanations, the system eventually comes to a
conclusion about the issue.
Advantages of Forward Chaining
Incomplete search: In some circumstances, forward chaining may not fully investigate all potential
solutions or may produce partial results.
Absence of a global perspective: As forward chaining simply takes into account the current set of facts or
circumstances, it might not evaluate the problem's wider context, which could result in inaccurate
conclusions.
Difficulty in handling conflicts: Conflict resolution may be challenging with forward chaining when there
are inconsistencies or conflicts between several facts or rules.
5. Informed Search Algorithms
Informed search algorithm contains an array of knowledge such as how far we are from the goal, path
cost, how to reach to goal node, etc. This knowledge help agents to explore less to the search space and
find more efficiently the goal node.
The informed search algorithm is more useful for large search space. Informed search algorithm uses the
idea of heuristic, so it is also called Heuristic search.
Heuristics function: Heuristic is a function which is used in Informed Search, and it finds the most
promising path. It takes the current state of the agent as its input and produces the estimation of how
close agent is from the goal. The heuristic method, however, might not always give the best solution,
but it guaranteed to find a good solution in reasonable time. Heuristic function estimates how close a
state is to the goal. It is represented by h(n), and it calculates the cost of an optimal path between the
pair of states. The value of the heuristic function is always positive.
Here h(n) is heuristic cost, and h*(n) is the estimated cost. Hence heuristic cost should be less than or
equal to the estimated cost.
Greedy best-first search algorithm always selects the path which appears best at that moment. It is the
combination of depth-first search and breadth-first search algorithms. It uses the heuristic function and
search. Best-first search allows us to take the advantages of both algorithms. With the help of best-first
search, at each step, we can choose the most promising node. In the best first search algorithm, we
expand the node which is closest to the goal node and the closest cost is estimated by heuristic function,
i.e.
1. f(n)= g(n).
Advantages:
Best first search can switch between BFS and DFS by gaining the advantages of both the
algorithms.
This algorithm is more efficient than BFS and DFS algorithms.
Disadvantages:
A* Search Algorithm:
A* search is the most commonly known form of best-first search. It uses heuristic function h(n), and cost
to reach the node n from the start state g(n). It has combined features of UCS and greedy best-first
search, by which it solve the problem efficiently. A* search algorithm finds the shortest path through the
search space using the heuristic function. This search algorithm expands less search tree and provides
optimal result faster. A* algorithm is similar to UCS except that it uses g(n)+h(n) instead of g(n).
In A* search algorithm, we use search heuristic as well as the cost to reach the node. Hence we can
combine both costs as following, and this sum is called as a fitness number.
Advantages:
Disadvantages:
It does not always produce the shortest path as it mostly based on heuristics and
approximation.
A* search algorithm has some complexity issues.
The main drawback of A* is memory requirement as it keeps all generated nodes in the
memory, so it is not practical for various large-scale problems.
6. Genetic Algorithm
A genetic algorithm is an adaptive heuristic search algorithm inspired by "Darwin's theory of evolution
in Nature." It is used to solve optimization problems in machine learning. It is one of the important
algorithms as it helps solve complex problems that would take a long time to solve.
It basically involves five phases to solve the complex optimization problems, which are given as below:
Initialization
Fitness Assignment
Selection
Reproduction
Termination
1. Initialization
The process of a genetic algorithm starts by generating the set of individuals, which is called population.
Here each individual is the solution for the given problem. An individual contains or is characterized by a
set of parameters called Genes. Genes are combined into a string and generate chromosomes, which is
the solution to the problem. One of the most popular techniques for initialization is the use of random
binary strings.
2. Fitness Assignment
Fitness function is used to determine how fit an individual is? It means the ability of an individual to
compete with other individuals. In every iteration, individuals are evaluated based on their fitness
function. The fitness function provides a fitness score to each individual. This score further determines
the probability of being selected for reproduction. The high the fitness score, the more chances of
getting selected for reproduction.
3. Selection
The selection phase involves the selection of individuals for the reproduction of offspring. All the
selected individuals are then arranged in a pair of two to increase reproduction. Then these individuals
transfer their genes to the next generation.
4. Reproduction
After the selection process, the creation of a child occurs in the reproduction step. In this step, the
genetic algorithm uses two variation operators that are applied to the parent population. The two
operators involved in the reproduction phase are given below:
Crossover: The crossover plays a most significant role in the reproduction phase of the genetic
algorithm. In this process, a crossover point is selected at random within the genes. Then the
crossover operator swaps genetic information of two parents from the current generation to
produce a new individual representing the offspring.
The genes of parents are exchanged among themselves until the crossover point is met. These
newly generated offspring are added to the population. This process is also called or crossover.
Types of crossover styles available:
o One point crossover
o Two-point crossover
o Livery crossover
o Inheritable Algorithms crossover
Mutation
The mutation operator inserts random genes in the offspring (new child) to maintain the
diversity in the population. It can be done by flipping some bits in the chromosomes.
Mutation helps in solving the issue of premature convergence and enhances diversification. The
below image shows the mutation process:
Types of mutation styles available,
o Flip bit mutation
o Gaussian mutation
o Exchange/Swap mutation
5. Termination
After the reproduction phase, a stopping criterion is applied as a base for termination. The algorithm
terminates after the threshold fitness solution is reached. It will identify the final solution as the best
solution in the population.
Genetic algorithms are not efficient algorithms for solving simple problems.
It does not guarantee the quality of the final solution to a problem.
Repetitive calculation of fitness values may generate some computational challenges.
7. Heuristic Search
A heuristic is a technique that is used to solve a problem faster than the classic methods. These
techniques are used to find the approximate solution of a problem when classical methods do not.
Heuristics are said to be the problem-solving techniques that result in practical and quick solutions.
Heuristics are strategies that are derived from past experience with similar problems. Heuristics use
practical methods and shortcuts used to produce the solutions that may or may not be optimal, but
those solutions are sufficient in a given limited timeframe.
1. Data Bias:
- AI systems heavily depend on training data. If the data used for training contains biases, the AI model
can perpetuate and even amplify those biases in its predictions or decisions.
2. Lack of Understanding:
- Many AI systems operate as "black boxes," making it challenging to understand the decision-making
process. This lack of transparency can be a significant barrier, especially in critical applications where
explanations for decisions are essential.
3. Ethical Concerns:
- As AI systems become more integrated into daily life, ethical considerations become crucial. Issues
like privacy invasion, job displacement, and unintended consequences of AI decisions need to be
addressed.
4. Over-reliance on Data:
- AI models often require vast amounts of data to perform well. In situations where data is scarce or
biased, the performance of AI systems can be compromised.
Expert Tasks:
- AI models may struggle to grasp the broader context of a situation, leading to potential errors or
misinterpretations, especially in complex expert tasks where a deep understanding of the domain is
crucial.
- Developing and training advanced AI models for expert tasks often demands significant
computational resources and expertise, limiting accessibility for smaller organizations or researchers.
3. Interdisciplinary Challenges:
- Expert tasks often require interdisciplinary knowledge. Integrating information from diverse fields
into AI systems can be challenging and might require collaboration between experts in different
domains.
4. Security Concerns:
- In expert tasks, security is a paramount concern. AI systems used in critical domains like healthcare or
finance must be resilient against adversarial attacks and unauthorized access.
UNIT 2
1. Propositional Logic
Propositional logic (PL) is the simplest form of logic where all the statements are made by propositions.
A proposition is a declarative statement which is either true or false. It is a technique of knowledge
representation in logical and mathematical form.
The syntax of propositional logic defines the allowable sentences for the knowledge representation.
There are two types of Propositions:
1. Atomic Propositions
2. Compound propositions
Atomic Proposition: Atomic propositions are the simple propositions. It consists of a single
proposition symbol. These are the sentences which must be either true or false.
ADExample:
Example:
Write about connectives like negation conjunction disjunction implication biconditional and their
tables
Properties of Operators:
Commutativity:
o P∧ Q= Q ∧ P, or
o P ∨ Q = Q ∨ P.
Associativity:
o (P ∧ Q) ∧ R= P ∧ (Q ∧ R),
o (P ∨ Q) ∨ R= P ∨ (Q ∨ R)
Identity element:
o P ∧ True = P,
o P ∨ True= True.
Distributive:
o P∧ (Q ∨ R) = (P ∧ Q) ∨ (P ∧ R).
o P ∨ (Q ∧ R) = (P ∨ Q) ∧ (P ∨ R).
DE Morgan's Law:
o ¬ (P ∧ Q) = (¬P) ∨ (¬Q)
o ¬ (P ∨ Q) = (¬ P) ∧ (¬Q).
Double-negation elimination:
o ¬ (¬P) = P.
We cannot represent relations like ALL, some, or none with propositional logic. Example:
1. All the girls are intelligent.
2. Some apples are sweet.
Propositional logic has limited expressive power.
In propositional logic, we cannot describe statements in terms of their properties or logical
relationships.
2. FOL
PL is not sufficient to represent the complex sentences or natural language statements. The
propositional logic has very limited expressive power. Consider the following sentence, which we cannot
represent using PL logic.
To represent the above statements, PL logic is not sufficient, so we required some more powerful logic,
such as first-order logic.
First-Order logic:
The syntax of FOL determines which collection of symbols is a logical expression in first-order logic. The
basic syntactic elements of first-order logic are symbols. We write statements in short-hand notation in
FOL.
Variables x, y, z, a, b,....
Equality ==
Quantifier ∀, ∃
Consider the statement: "x is an integer.", it consists of two parts, the first part x is the subject of the
statement and second part "is an integer," is known as a predicate.
Universal Quantifier:
Universal quantifier is a symbol of logical representation, which specifies that the statement within its
range is true for everything or every instance of a particular thing.
For all x
For each x
For every x.
Existential Quantifier:
Existential quantifiers are the type of quantifiers, which express that the statement within its scope is
true for at least one instance of something.
It is denoted by the logical operator ∃, which resembles as inverted E. When it is used with a predicate
variable then it is called as an existential quantifier.
If x is a variable, then existential quantifier will be ∃x or ∃(x). And it will be read as:
Properties of Quantifiers:
The quantifiers interact with variables which appear in a suitable way. There are two types of variables
in First-order logic which are given below:
Free Variable: A variable is said to be a free variable in a formula if it occurs outside the scope of the
quantifier.
Bound Variable: A variable is said to be a bound variable in a formula if it occurs within the scope of the
quantifier.
Inference engine:
The inference engine is the component of the intelligent system in artificial intelligence, which applies
logical rules to the knowledge base to infer new information from known facts. The first inference
engine was part of the expert system. Inference engine commonly proceeds in two modes, which are:
1. Forward chaining
2. Backward chaining
Horn clause and definite clause are the forms of sentences, which enables knowledge base to use a
more restricted and efficient inference algorithm. Logical inference algorithms use forward and
backward chaining approaches, which require KB in the form of the first-order definite clause.
Definite clause: A clause which is a disjunction of literals with exactly one positive literal is known as a
definite clause or strict horn clause.
Horn clause: A clause which is a disjunction of literals with at most one positive literal is known as horn
clause. Hence all the definite clauses are horn clauses.
It is equivalent to p ∧ q → k.
A. Forward Chaining
Forward chaining is also known as a forward deduction or forward reasoning method when using an
inference engine. Forward chaining is a form of reasoning which start with atomic sentences in the
knowledge base and applies inference rules (Modus Ponens) in the forward direction to extract more
data until a goal is reached.
The Forward-chaining algorithm starts from known facts, triggers all rules whose premises are satisfied,
and add their conclusion to the known facts. This process repeats until the problem is solved.
Properties of Forward-Chaining:
B. Backward Chaining:
Backward-chaining is also known as a backward deduction or backward reasoning method when using
an inference engine. A backward chaining algorithm is a form of reasoning, which starts with the goal
and works backward, chaining through rules to find known facts that support the goal.
4. Unification
Unification is a process of making two different logical atomic expressions identical by finding a
substitution. Unification depends on the substitution process.
It takes two literals as input and makes them identical using substitution.
Let Ψ1 and Ψ2 be two atomic sentences and 𝜎 be a unifier such that, Ψ1𝜎 = Ψ2𝜎, then it can be
expressed as UNIFY(Ψ1, Ψ2).
Example: Find the MGU for Unify{King(x), King(John)}
Substitution θ = {John/x} is a unifier for these atoms and applying this substitution, and both
expressions will be identical.
The UNIFY algorithm is used for unification, which takes two atomic sentences and returns a
unifier for those sentences (If any exist).
Unification is a key component of all first-order inference algorithms.
It returns fail if the expressions do not match with each other.
The substitution variables are called Most General Unifier or MGU.
Predicate symbol must be same, atoms or expression with different predicate symbol can never
be unified.
Number of Arguments in both expressions must be identical.
Unification will fail if there are two similar variables present in the same expression.
5. Resolution
Resolution is a theorem proving technique that proceeds by building refutation proofs. Resolution is
used, if there are various statements are given, and we need to prove a conclusion of those statements.
Unification is a key concept in proofs by resolutions. Resolution is a single inference rule which can
efficiently operate on the conjunctive normal form or clausal form.
Clause: Disjunction of literals (an atomic sentence) is called a clause. It is also known as a unit clause.
The resolution rule for first-order logic is simply a lifted version of the propositional rule. Resolution can
resolve two clauses if they contain complementary literals, which are assumed to be standardized apart
so that they share no variables.
This rule is also called the binary resolution rule because it only resolves exactly two literals.
Example:
Where two complimentary literals are: Loves (f(x), x) and ¬ Loves (a, b)
These literals can be unified with unifier θ= [a/f(x), and b/x] , and it will generate a resolvent clause:
6. Inference in FOL
Inference in First-Order Logic is used to deduce new facts or sentences from existing sentences. Before
understanding the FOL inference rule, let's understand some basic terminologies used in FOL.
As propositional logic we also have inference rules in first-order logic, so following are some basic
inference rules in FOL:
1. Universal Generalization:
Universal generalization is a valid inference rule which states that if premise P(c) is true for any
arbitrary element c in the universe of discourse, then we can have a conclusion as ∀ x P(x).
Example: Let's represent, P(c): "A byte contains 8 bits", so for ∀ x P(x) "All bytes contain 8 bits.", it will
also be true.
2. Universal Instantiation:
Universal instantiation is also called as universal elimination or UI is a valid inference rule. It can
be applied multiple times to add new sentences.
The new KB is logically equivalent to the previous KB.
As per UI, we can infer any sentence obtained by substituting a ground term for the variable.
The UI rule state that we can infer any sentence P(c) by substituting a ground term c (a constant
within domain x) from ∀ x P(x) for any object in the universe of discourse.
Example:1.
Example: 2.
"All kings who are greedy are Evil." So let our knowledge base contains this detail as in the form of FOL:
So from this information, we can infer any of the following statements using Universal Instantiation:
3. Existential Instantiation:
Existential instantiation is also called as Existential Elimination, which is a valid inference rule in
first-order logic.
It can be applied only once to replace the existential sentence.
The new KB is not logically equivalent to old KB, but it will be satisfiable if old KB was satisfiable.
This rule states that one can infer P(c) from the formula given in the form of ∃x P(x) for a new
constant symbol c.
The restriction with this rule is that c used in the rule must be a new term for which P(c ) is true.
It can be represented as:
Example:
So we can infer: Crown(K) ∧ OnHead( K, John), as long as K does not appear in the knowledge base.
4. Existential introduction
Knowledge representation and reasoning (KR, KRR) is the part of Artificial intelligence which
concerned with AI agents thinking and how thinking contributes to intelligent behavior of
agents.
It is responsible for representing information about the real world so that a computer can
understand and can utilize this knowledge to solve the complex real world problems such as
diagnosis a medical condition or communicating with humans in natural language.
It is also a way which describes how we can represent knowledge in artificial intelligence.
Knowledge representation is not just storing data into some database, but it also enables an
intelligent machine to learn from that knowledge and experiences so that it can behave
intelligently like a human.
What to Represent:
Object: All the facts about objects in our world domain. E.g., Guitars contains strings, trumpets
are brass instruments.
Events: Events are the actions which occur in our world.
Performance: It describe behavior which involves knowledge about how to do things.
Meta-knowledge: It is knowledge about what we know.
Facts: Facts are the truths about the real world and what we represent.
Knowledge-Base: The central component of the knowledge-based agents is the knowledge
base. It is represented as KB. The Knowledgebase is a group of the Sentences (Here, sentences
are used as a technical term and not identical with the English language).
Knowledge: Knowledge is awareness or familiarity gained by experiences of facts, data, and
situations.
Types of knowledge
Declarative Knowledge:
2. Procedural Knowledge
3. Meta-knowledge:
4. Heuristic knowledge:
5. Structural knowledge:
There are mainly four approaches to knowledge representation, which are givenbelow:
It is the simplest way of storing facts which uses the relational method, and each fact about a
set of the object is set out systematically in columns.
This approach of knowledge representation is famous in database systems where the
relationship between different entities is represented.
This approach has little opportunity for inference.
2. Inheritable knowledge:
In the inheritable knowledge approach, all data must be stored into a hierarchy of classes.
All classes should be arranged in a generalized form or a hierarchal manner.
In this approach, we apply inheritance property.
Elements inherit values from other members of a class.
This approach contains inheritable knowledge which shows a relation between instance and
class, and it is called instance relation.
Every individual frame can represent the collection of attributes and its value.
In this approach, objects and values are represented in Boxed nodes.
We use Arrows which point from objects to their values.
3. Inferential knowledge:
man(Marcus)
∀x = man (x) ----------> mortal (x)s
4. Procedural knowledge:
Procedural knowledge approach uses small programs and codes which describes how to do
specific things, and how to proceed.
In this approach, one important rule is used which is If-Then rule.
In this knowledge, we can use various coding languages such as LISP language and Prolog
language.
We can easily represent heuristic or domain-specific knowledge using this approach.
But it is not necessary that we can represent all cases in this approach.
8. Non-monotonic reasoning
9. English to Prolog Facts
Prolog, short for "Programming in Logic," is a declarative programming language primarily used in the
field of artificial intelligence. Unlike traditional imperative languages, Prolog focuses on expressing
relationships and logical rules rather than step-by-step instructions. Its distinctive feature is the ability to
perform automated reasoning and inference based on a set of rules and facts.
Prolog Facts:
In Prolog, facts are fundamental building blocks used to represent statements or truths. They can range
from simple declarations to complex relationships. Here's an exploration of Prolog facts:
1. Simple Facts:
sky_color(blue).
2. Binary Relations:
mother(mary, john).
4. Variable Usage:
chases(cat, mouse).
5. Negation in Facts:
not(planet(sun)).
twice(X, Y) :- Y is 2 X.
Prolog uses the `=` operator for matching expressions. Two terms are considered matched if they are
equal or can be made equal by assigning values to variables:
equation(X, Y) :- X + 3 =:= 2 Y.
By combining these elements, Prolog provides a powerful framework for representing and reasoning
about relationships, making it particularly well-suited for applications in artificial intelligence, natural
language processing, and databases. Its declarative nature allows developers to focus on specifying what
they want, leaving the Prolog interpreter to determine how to achieve the desired outcomes.
10. Categories and Objects
Categories:
1. Definition:
- Categories are high-level, conceptual groupings that help organize knowledge. They represent classes
or sets of entities that share common characteristics.
2. Examples:
- Animals: This category encompasses various subcategories like mammals, birds, and reptiles.
3. Hierarchical Structure:
- Categories often have a hierarchical structure, forming a taxonomy. This structure allows for the
representation of broader and more specific categories.
Example:
% Animal hierarchy
is_a(mammal, animal).
is_a(bird, animal).
Objects:
1. Definition:
- Objects are instances or individual entities within a category. They represent specific occurrences or
items belonging to a particular class.
2. Examples:
- Golden Retriever: An instance of the "Dog" category under the broader "Mammal" category.
Example:
% Dog attributes
1. Semantic Networks:
- Example: Representing relationships between categories and objects using nodes and edges. Nodes
can represent categories, and edges signify relationships.
2. Frames:
- Example: Structuring information using frames, where each frame corresponds to a category or
object. Frames contain slots for attributes and properties.
Example:
3. Ontologies:
- Example: Developing ontologies to formally define relationships and hierarchies between different
categories and objects.
Example:
% Ontology relationships
has_parent(mammal, animal).
11. Reasoning System
Reasoning systems in computational intelligence play a crucial role in enabling machines to make
decisions, draw inferences, and solve problems. These systems emulate human-like reasoning
processes, allowing AI systems to navigate complex scenarios. Here's an overview of reasoning systems
in the context of artificial intelligence:
1. Symbolic Reasoning:
- Definition:
- Symbolic reasoning involves manipulating symbols and logical operations to derive new information
from existing knowledge.
- Example:
- Using logical rules to infer that if an entity is a bird and can fly, then it belongs to the category of
flying creatures.
2. Rule-Based Reasoning:
- Definition:
- Rule-based reasoning relies on a set of predefined rules to make decisions or draw conclusions.
- Example:
- If it's raining, carry an umbrella. This rule guides the decision-making process based on a specific
condition.
3. Case-Based Reasoning:
- Definition:
- Case-based reasoning involves solving new problems by recalling and adapting solutions from
similar past cases.
- Example:
- Diagnosing a medical condition based on similarities to previous cases with known outcomes.
- Definition:
- Fuzzy logic reasoning deals with uncertainty by allowing values to range between true and false,
enabling a more flexible approach to decision-making.
- Example:
5. Probabilistic Reasoning:
- Definition:
- Probabilistic reasoning involves assessing and calculating probabilities to make decisions under
uncertainty.
- Example:
- Definition:
- Machine learning reasoning involves learning patterns and making predictions based on data
without explicit programming.
- Example:
- Training a model to recognize handwritten digits and making predictions on new, unseen data.
7. Abductive Reasoning:
- Definition:
- Abductive reasoning involves making the best explanation or hypothesis for observed facts or
evidence.
- Example:
8. Commonsense Reasoning:
- Definition:
- Commonsense reasoning enables AI systems to make deductions based on general knowledge and
common understanding.
- Example:
9. Inference Engines:
- Definition:
- Inference engines are components that process rules and information to draw conclusions in a
reasoning system.
- Example:
- Utilizing an inference engine to evaluate rules and deduce outcomes in a medical diagnostic system.
- Definition:
- Automated reasoning involves the use of algorithms and computational methods to perform logical
inference and solve problems.
- Example:
1. Fuzzy Logic/Set/Rules
Fuzzy Logic is a form of many-valued logic in which the truth values of variables may be any real number
between 0 and 1, instead of just the traditional values of true or false. It is used to deal with imprecise
or uncertain information and is a mathematical method for representing vagueness and uncertainty in
decision-making. It allows for partial truths, where a statement can be partially true or false, rather than
fully true or false.
The fundamental concept of Fuzzy Logic is the membership function, which defines the degree of
membership of an input value to a certain set or category. The membership function is a mapping from
an input value to a membership degree between 0 and 1, where 0 represents non-membership and 1
represents full membership.
ARCHITECTURE
RULE BASE: It contains the set of rules and the IF-THEN conditions provided by the experts to
govern the decision-making system, on the basis of linguistic information. Recent developments
in fuzzy theory offer several effective methods for the design and tuning of fuzzy controllers.
Most of these developments reduce the number of fuzzy rules.
FUZZIFICATION: It is used to convert inputs i.e. crisp numbers into fuzzy sets. Crisp inputs are
basically the exact inputs measured by sensors and passed into the control system for
processing, such as temperature, pressure, rpm’s, etc.
INFERENCE ENGINE: It determines the matching degree of the current fuzzy input with respect
to each rule and decides which rules are to be fired according to the input field. Next, the fired
rules are combined to form the control actions.
DEFUZZIFICATION: It is used to convert the fuzzy sets obtained by the inference engine into a
crisp value. There are several defuzzification methods available and the best-suited one is used
with a specific expert system to reduce the error.
Membership function
Definition: A graph that defines how each point in the input space is mapped to membership value
between 0 and 1. Input space is often referred to as the universe of discourse or universal set (u), which
contains all the possible elements of concern in each particular application.
What is Fuzzy Control?
This system can work with any type of inputs whether it is imprecise, distorted or noisy input
information.
The construction of Fuzzy Logic Systems is easy and understandable.
Fuzzy logic comes with mathematical concepts of set theory and the reasoning of that is quite
simple.
It provides a very efficient solution to complex problems in all fields of life as it resembles
human reasoning and decision-making.
The algorithms can be described with little data, so little memory is required.
Many researchers proposed different ways to solve a given problem through fuzzy logic which
leads to ambiguity. There is no systematic approach to solve a given problem through fuzzy
logic.
Proof of its characteristics is difficult or impossible in most cases because every time we do not
get a mathematical description of our approach.
As fuzzy logic works on precise as well as imprecise data so most of the time accuracy is
compromised.
Application
Fuzzy logic is used in Natural language processing and various intensive applications in Artificial
Intelligence.
Fuzzy logic is extensively used in modern control systems such as expert systems.
Fuzzy Logic is used with Neural Networks as it mimics how a person would make decisions, only
much faster. It is done by Aggregation of data and changing it into more meaningful data by
forming partial truths as Fuzzy sets.
Fuzzy set:
1. Fuzzy set is a set having degrees of membership between 1 and 0. Fuzzy sets are represented
with tilde character(~). For example, Number of cars following traffic signals at a particular time
out of all cars present will have membership value between [0,1].
2. Partial membership exists when member of one fuzzy set can also be a part of other fuzzy sets in
the same universe.
3. The degree of membership or truth is not same as probability, fuzzy truth represents
membership in vaguely defined sets.
4. A fuzzy set A~ in the universe of discourse, U, can be defined as a set of ordered pairs and it is
given by
and an element 𝑦 of the universe, the following relations express the union, intersection and
complement operation on fuzzy sets.
Union/Fuzzy ‘OR’
Let us consider the following representation to understand how the Union/Fuzzy ‘OR’ relation works −
μA˜∪B˜(y)=μA˜∨μB˜∀y∈U
Let us consider the following representation to understand how the Intersection/Fuzzy ‘AND’ relation
works −
μA˜∩B˜(y)=μA˜∧μB˜∀y∈U
Complement/Fuzzy ‘NOT’
Let us consider the following representation to understand how the Complement/Fuzzy ‘NOT’ relation
works −
μA˜=1−μA˜(y)y∈U
Commutative Property
Associative Property
Distributive Property
Idempotency Property
Identity Property
Involution Property
De Morgan’s Law
2. Fuzzy Inference System
Fuzzy Inference System is the key unit of a fuzzy logic system having decision making as its primary work.
It uses the “IF…THEN” rules along with connectors “OR” or “AND” for drawing essential decision rules.
The output from FIS is always a fuzzy set irrespective of its input which can be fuzzy or crisp.
It is necessary to have fuzzy output when it is used as a controller.
A defuzzification unit would be there with FIS to convert fuzzy variables into crisp variables.
The following five functional blocks will help you understand the construction of FIS −
Working of FIS
A fuzzification unit supports the application of numerous fuzzification methods, and converts
the crisp input into fuzzy input.
A knowledge base - collection of rule base and database is formed upon the conversion of crisp
input into fuzzy input.
The defuzzification unit fuzzy input is finally converted into crisp output.
Methods of FIS
Let us now discuss the different methods of FIS. Following are the two important methods of FIS, having
different consequent of fuzzy rules −
This system was proposed in 1975 by Ebhasim Mamdani. Basically, it was anticipated to control a steam
engine and boiler combination by synthesizing a set of fuzzy rules obtained from people working on the
system.
Following steps need to be followed to compute the output from this FIS −
This model was proposed by Takagi, Sugeno and Kang in 1985. Format of this rule is given as −
Here, AB are fuzzy sets in antecedents and z = f(x,y) is a crisp function in the consequent.
The fuzzy inference process under Takagi-Sugeno Fuzzy Model (TS Method) works in the following way −
Step 1: Fuzzifying the inputs − Here, the inputs of the system are made fuzzy.
Step 2: Applying the fuzzy operator − In this step, the fuzzy operators must be applied to get
the output.
Let us now understand the comparison between the Mamdani System and the Sugeno Model.
Output Membership Function − The main difference between them is on the basis of output
membership function. The Sugeno output membership functions are either linear or constant.
Aggregation and Defuzzification Procedure − The difference between them also lies in the
consequence of fuzzy rules and due to the same their aggregation and defuzzification procedure
also differs.
Mathematical Rules − More mathematical rules exist for the Sugeno rule than the Mamdani
rule.
Adjustable Parameters − The Sugeno controller has more adjustable parameters than the
Mamdani controller.
3. Neuro Fuzzy
A Neuro-Fuzzy Inference System (NFIS) is a type of artificial intelligence system that combines fuzzy logic
with neural network technology to improve the accuracy and performance of fuzzy inference systems.
The goal of a NFIS is to provide a more flexible and adaptive system that can better handle complex and
uncertain data.
ANFIS combines the ability of neural networks to learn from data with the ability of fuzzy logic to handle
uncertainty and imprecision in the data.
ANFIS uses a hybrid learning algorithm that combines the backpropagation algorithm of neural networks
and the gradient descent method of the least squares method for adjusting the parameters of the fuzzy
inference system. ANFIS is essentially a Sugeno-type fuzzy inference system with the parameters of the
fuzzy rules determined by the backpropagation algorithm.
Fuzzy control is a type of control system that uses fuzzy logic to adjust the output of a system based on
input data that is uncertain or imprecise. The goal of fuzzy control is to provide a flexible and adaptable
control system that can handle changing conditions and uncertain data. It is a high level representation
language with local semantics and an interpreter/compiler to synthesize non-linear (control) surfaces.
The Fuzzy Control Types present are: – Type I: RHS is a monotonic function
– Type II: RHS is a fuzzy set
– Type III: RHS is a (linear) function of state
Note that Type II Fuzzy Control must be tuned manually and Type III Fuzzy Control (Takagi-Sugeno type)
have an automatic Right Hand Side (RHS) tuning.
Temporal logic is a subfield of mathematical logic that deals with reasoning about time and the temporal
relationships between events. In artificial intelligence, temporal logic is used as a formal language to
describe and reason about the temporal behavior of systems and processes.
Temporal logic extends classical propositional and first-order logic with constructs for specifying
temporal relationships, such as “before,” “after,” “during,” and “until.” This allows for the expression of
temporal constraints and the modeling of temporal aspects of a system, such as its evolution over time
and the relationships between events.
Advantages of using Temporal Logic in Artificial Intelligence:
1. Formal specification: Temporal logic provides a formal language for specifying the desired
behavior of systems and processes, making it easier to ensure that these systems behave
correctly and satisfy the specified requirements.
2. Verification: Temporal logic can be used to verify that a system satisfies the specified temporal
properties, providing a rigorous method for checking the correctness of systems and reducing
the risk of errors.
3. Modeling: Temporal logic allows for the modeling of complex temporal behavior of systems and
processes, making it useful for a wide range of applications in artificial intelligence, such as
robotics and control systems.
4. Completeness: Temporal logic provides a complete system for reasoning about time, making it
well-suited for applications that involve temporal reasoning.
Disadvantages of using Temporal Logic in Artificial Intelligence:
1. Complexity: The formal syntax and semantics of temporal logic can be complex, making it
challenging to use for some applications and requiring a high level of mathematical expertise.
2. Limitations: Temporal logic is a formal language and may not be well-suited for certain
applications that involve uncertain or vague temporal relationships.
Temporal reasoning in AI refers to the ability of an artificial intelligence system to understand and
reason about events, actions, and relationships that occur over time. It involves the capability to
perceive, represent, and manipulate temporal information to make predictions, plan actions, or make
decisions in dynamic environments.
1. Temporal Logic: Temporal logic is a formal language used to reason about time and temporal
relationships. It provides a set of operators and rules for expressing temporal constraints, such
as "before," "after," "during," and "until." Temporal logic can be used to specify properties of
systems and verify their behavior.
2. Time Series Analysis: Time series analysis involves analyzing and forecasting data points
collected over time. It includes techniques such as autoregressive integrated moving average
(ARIMA), exponential smoothing, and recurrent neural networks (RNNs). Time series analysis
enables AI systems to detect patterns, trends, and anomalies in temporal data.
3. Temporal Probabilistic Models: Probabilistic models, such as hidden Markov models (HMMs)
and dynamic Bayesian networks (DBNs), can incorporate temporal information and uncertainty.
These models can capture dependencies between variables over time and make predictions
based on probabilistic reasoning.
4. Temporal Reasoning in Planning: Temporal reasoning is crucial in planning and scheduling
problems, where actions and events must be ordered and coordinated over time. Techniques
like temporal planning networks (TPNs) and temporal constraint networks (TCNs) help AI
systems reason about temporal constraints and create effective plans.
5. Temporal Reasoning in Natural Language Processing: Understanding and generating natural
language often requires temporal reasoning. Temporal expressions such as dates, durations, and
temporal relations play a significant role in language comprehension and generation tasks. AI
models need to interpret and reason about these temporal aspects to generate accurate and
contextually appropriate responses.
5. Back Propagation Neural Networks
Backpropagation is an algorithm that backpropagates the errors from the output nodes to the
input nodes. Therefore, it is simply referred to as the backward propagation of errors. It uses in
the vast applications of neural networks in data mining like Character recognition, Signature
verification, etc.
Neural Network:
Neural networks are an information processing paradigm inspired by the human nervous
system. Just like in the human nervous system, we have biological neurons in the same way in
neural networks we have artificial neurons, artificial neurons are mathematical functions
derived from biological neurons. The human brain is estimated to have about 10 billion neurons,
each connected to an average of 10,000 other neurons. Each neuron receives a signal through a
synapse, which controls the effect of the signconcerning on the neuron.
Features of Backpropagation:
• it is the gradient descent method as used in the case of simple perceptron network with
the differentiable unit.
• it is different from other networks in respect to the process by which the weights are
calculated during the learning period of the network.
• training is done in the three stages :
• the feed-forward of input training pattern
• the calculation and backpropagation of the error
• updation of the weight
• Working of Backpropagation:
• Neural networks use supervised learning to generate output vectors from input vectors
that the network operates on. It Compares generated output to the desired output and
generates an error report if the result does not match the generated output vector.
Then it adjusts the weights according to the bug report to get your desired output.
Backpropagation Algorithm:
Step 2: The input is modeled using true weights W. Weights are usually chosen randomly.
Step 3: Calculate the output of each neuron from the input layer to the hidden layer to the
output layer.
1. Symbolic Logic:
- Description:
- Symbolic logic is a conventional reasoning approach that uses symbols and logical
operations to represent and manipulate knowledge.
- Example:
- Utilizing propositional or first-order logic to express relationships and rules.
2. Rule-Based Systems:
- Description:
- Rule-based systems use a set of predefined rules to guide decision-making and inference.
- Example:
- Implementing rules such as "if condition A and condition B, then action C" in expert systems.
3. Decision Trees:
- Description:
- Decision trees are hierarchical structures that use a series of if-else conditions to make
decisions.
- Example:
- Constructing a decision tree for diagnosing medical conditions based on patient symptoms.
4. Expert Systems:
- Description:
- Expert systems emulate human expertise by encoding domain-specific knowledge in a set of
rules.
- Example:
- A medical expert system diagnosing diseases based on symptoms and medical history.
5. Predicate Logic:
- Description:
- Predicate logic extends symbolic logic by introducing predicates, allowing for more complex
expressions.
- Example:
- Expressing relationships with predicates like "is_a(animal, mammal)" in knowledge
representation.
2. Neural Networks:
- Description:
- Neural networks are computational models inspired by the human brain, capable of learning
patterns and making predictions based on data.
- Example:
- Training a neural network to recognize objects in images.
3. Genetic Algorithms:
- Description:
- Genetic algorithms mimic the process of natural selection to evolve solutions to
optimization and search problems.
- Example:
- Optimizing parameters for a complex system through iterative evolution.
4. Case-Based Reasoning:
- Description:
- Case-based reasoning involves solving new problems by recalling and adapting solutions
from similar past cases.
- Example:
- Diagnosing technical issues based on similarities to previously resolved cases.
5. Probabilistic Reasoning:
- Description:
- Probabilistic reasoning involves assessing and calculating probabilities to make decisions
under uncertainty.
- Example:
- Predicting the likelihood of a stock price increase based on historical data.
6. Swarm Intelligence:
- Description:
- Swarm intelligence models collective behavior observed in natural systems, such as ant
colonies or bird flocks, to solve problems.
- Example:
- Swarm robotics coordinating multiple robots to perform a task collectively.
Conventional and non-conventional reasoning systems in computational intelligence offer
diverse approaches to problem-solving and decision-making. While conventional systems rely on
rule-based and logical methods, non-conventional systems leverage approaches inspired by
nature and learning algorithms. The choice of reasoning system depends on the nature of the
problem, the type of data available, and the desired outcomes in a given AI application.
Conventional reasoning systems follow a rule- Non-conventional reasoning systems often adopt
based approach, relying on explicit programming a learning-based approach, allowing systems to
of rules and logical operations. learn patterns and relationships from data.
Decisions are made based on predefined rules, The system learns from examples, adapting and
making the reasoning process explicit and evolving its behavior over time.
deterministic.
The decision-making process is often clear, These systems can manage imprecision,
providing a straightforward representation of uncertainty, and incomplete information,
knowledge and rules. providing more flexibility in decision-making.
UNIT - 4
Bayes' theorem is determining the probability of an event with uncertain knowledge. In probability
theory, it relates the conditional probability and marginal probabilities of two random events.
Bayes' theorem was named after the British mathematician Thomas Bayes. The Bayesian inference is an
application of Bayes' theorem, which is fundamental to Bayesian statistics.
Bayes' theorem allows updating the probability prediction of an event by observing new information of
the real world.
P(A|B) is known as posterior, which we need to calculate, and it will be read as Probability of hypothesis
A when we have occurred an evidence B.
P(B|A) is called the likelihood, in which we consider that hypothesis is true, then we calculate the
probability of evidence.
P(A) is called the prior probability, probability of hypothesis before considering the evidence
Bayesian belief network is key computer technology for dealing with probabilistic events and to solve a
problem which has uncertainty. We can define a Bayesian network as:
"A Bayesian network is a probabilistic graphical model which represents a set of variables and their
conditional dependencies using a directed acyclic graph."
Bayesian networks are probabilistic, because these networks are built from a probability distribution,
and also use probability theory for prediction and anomaly detection.
Real world applications are probabilistic in nature, and to represent the relationship between multiple
events, we need a Bayesian network. It can also be used in various tasks including prediction, anomaly
detection, diagnostics, automated insight, reasoning, time series prediction, and decision making
under uncertainty.
Bayesian Network can be used for building models from data and experts opinions, and it consists of
two parts:
The generalized form of Bayesian network that represents and solve decision problems under uncertain
knowledge is known as an Influence diagram.
A Bayesian network graph is made up of nodes and Arcs (directed links), where:
Each node corresponds to the random variables, and a variable can be continuous or discrete.
Arc or directed arrows represent the causal relationship or conditional probabilities between
random variables. These directed links or arrows connect the pair of nodes in the graph. These
links represent that one node directly influence the other node, and if there is no directed link
that means that nodes are independent with each other
o In the above diagram, A, B, C, and D are random variables represented by the nodes of
the network graph.
o If we are considering node B, which is connected with node A by a directed arrow,
then node A is called the parent of Node B.
o Node C is independent of node A.
SUM IN https://www.javatpoint.com/bayesian-belief-network-in-artificial-intelligence
Hidden Markov Models (HMMs) are a type of probabilistic model that are commonly used in machine
learning for tasks such as speech recognition, natural language processing, and bioinformatics. They are
a popular choice for modelling sequences of data because they can effectively capture the underlying
structure of the data, even when the data is noisy or incomplete. In this article, we will give a
comprehensive overview of Hidden Markov Models, including their mathematical foundations,
applications, and limitations.
A Hidden Markov Model (HMM) is a probabilistic model that consists of a sequence of hidden states,
each of which generates an observation. The hidden states are usually not directly observable, and the
goal of HMM is to estimate the sequence of hidden states based on a sequence of observations. An
HMM is defined by the following components:
The basic idea behind an HMM is that the hidden states generate the observations, and the observed
data is used to estimate the hidden state sequence. This is often referred to as the forward-backwards
algorithm.
The Hidden Markov Model (HMM) algorithm can be implemented using the following steps:
The state space is the set of all possible hidden states, and the observation space is the set of all possible
observations.
These are the probabilities of transitioning from one state to another. This forms the transition matrix,
which describes the probability of moving from one state to another.
These are the probabilities of generating each observation from each state. This forms the emission
matrix, which describes the probability of generating each observation from each state.
Given the observed data, the Viterbi algorithm is used to compute the most likely sequence of hidden
states. This can be used to predict future observations, classify sequences, or detect patterns in
sequential data.
The performance of the HMM can be evaluated using various metrics, such as accuracy, precision, recall,
or F1 score.
Speech Recognition - HMMs are widely used in speech recognition systems. The model is
trained on a large corpus of speech data, and the transitions between phonemes are modeled
using a Markov process. The output of the model is a sequence of phonemes, which can be used
to recognize words and sentences.
Natural Language Processing - HMMs are also used in natural language processing (NLP)
applications such as part-of-speech tagging and named entity recognition. In these applications,
the HMM is trained on a corpus of text data, and the model learns to predict the sequence of
parts of speech or named entities based on the context of the input text.
Bioinformatics - HMMs have applications in bioinformatics, where they are used for protein
structure prediction and sequence alignment. In protein structure prediction, the HMM is
trained on a set of known protein structures, and the model is used to predict the structure of
new proteins. In sequence alignment, HMMs are used to find the best match between two or
more protein or DNA sequences.
Finance - HMMs have applications in finance, where they are used for modeling stock prices,
interest rates, and credit risk. In stock price modeling, the HMM is used to predict the future
movement of stock prices based on historical data. In interest rate modeling, the HMM is used
to predict changes in interest rates based on economic factors. In credit risk modeling, the HMM
is used to predict the probability of default based on the borrower's credit history.
3. EM Algorithm
A latent variable model consists of both observable and unobservable variables where observable can
be predicted while unobserved are inferred from the observed variable. These unobservable variables
are known as latent variables.
Being an iterative approach, it consists of two modes. In the first mode, we estimate the missing or
latent variables. Hence it is referred to as the Expectation/estimation step (E-step). Further, the other
mode is used to optimize the parameters of the models so that it can explain the data more clearly. The
second mode is known as the maximization-step or M-step.
Convergence is defined as the specific situation in probability based on intuition, e.g., if there are two
random variables that have very less difference in their probability, then they are known as converged.
In other words, whenever the values of given variables are matched with each other, it is called
convergence.
Steps in EM Algorithm
The EM algorithm is completed mainly in 4 steps, which include Initialization Step, Expectation Step,
Maximization Step, and convergence Step. These steps are explained as follows:
1st Step: The very first step is to initialize the parameter values. Further, the system is provided
with incomplete observed data with the assumption that data is obtained from a specific model.
2nd Step: This step is known as Expectation or E-Step, which is used to estimate or guess the
values of the missing or incomplete data using the observed data. Further, E-step primarily
updates the variables.
3rd Step: This step is known as Maximization or M-step, where we use complete data obtained
from the 2nd step to update the parameter values. Further, M-step primarily updates the
hypothesis.
4th step: The last step is to check if the values of latent variables are converging or not. If it gets
"yes", then stop the process; else, repeat the process from step 2 until the convergence occurs.
4. Reinforcement Learning
Reinforcement learning is used to find the best possible behavior or path it should take in a specific
situation. Reinforcement learning differs from supervised learning in a way that in supervised learning
the training data has the answer key with it so the model is trained with the correct answer itself
whereas in reinforcement learning, there is no answer but the reinforcement agent decides what to do
to perform the given task. In the absence of a training dataset, it is bound to learn from its experience.
Reinforcement learning is an autonomous, self-teaching system that essentially learns by trial and error.
It performs actions with the aim of maximizing rewards, or in other words, it is learning by doing in
order to achieve the best outcomes.
Reinforcement learning uses algorithms that learn from outcomes and decide which action to take next.
After each action, the algorithm receives feedback that helps it determine whether the choice it made
was correct, neutral or incorrect. It is a good technique to use for automated systems that have to make
a lot of small decisions without human guidance.
Types of Reinforcement:
Policy: Policy defines the learning agent behavior for given time period. It is a mapping from perceived
states of the environment to actions to be taken when in those states.
Reward function: Reward function is used to define a goal in a reinforcement learning problem. A
reward function is a function that provides a numerical score based on the state of the environment
Value function: Value functions specify what is good in the long run. The value of a state is the total
amount of reward an agent can expect to accumulate over the future, starting from that state.
The agent has sensors to decide on its state in the environment and takes action that modifies its state.
The reinforcement learning problem model is an agent continuously interacting with an
environment. The agent and the environment interact in a sequence of time steps. At each time
step t, the agent receives the state of the environment and a scalar numerical reward for the
previous action, and then the agent then selects an action.
5. SVM
Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms, which is
used for Classification as well as Regression problems. However, primarily, it is used for Classification
problems in Machine Learning.
The goal of the SVM algorithm is to create the best line or decision boundary that can segregate n-
dimensional space into classes so that we can easily put the new data point in the correct category in
the future. This best decision boundary is called a hyperplane.
SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme cases are
called as support vectors, and hence algorithm is termed as Support Vector Machine. Consider the
below diagram in which there are two different categories that are classified using a decision boundary
or hyperplane:
Hyperplane: There can be multiple lines/decision boundaries to segregate the classes in n-dimensional
space, but we need to find out the best decision boundary that helps to classify the data points. This
best boundary is known as the hyperplane of SVM.
The dimensions of the hyperplane depend on the features present in the dataset, which means if there
are 2 features (as shown in image), then hyperplane will be a straight line. And if there are 3 features,
then hyperplane will be a 2-dimension plane.
We always create a hyperplane that has a maximum margin, which means the maximum distance
between the data points.
Support Vectors:
The data points or vectors that are the closest to the hyperplane and which affect the position of the
hyperplane are termed as Support Vector. Since these vectors support the hyperplane, hence called a
Support vector.
Types of SVM
Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset can be
classified into two classes by using a single straight line, then such data is termed as linearly
separable data, and classifier is used called as Linear SVM classifier.
Linear SVMs work by finding a hyperplane that separates data points into different classes in a
linearly separable space. The hyperplane is a linear equation that is defined by the weights and
biases of the model. The decision boundary is a straight line that separates the classes. Linear
SVMs are efficient and easy to interpret, but they can only be used for linearly separable data.
Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which means if a
dataset cannot be classified by using a straight line, then such data is termed as non-linear data
and classifier used is called as Non-linear SVM classifier.
Non-linear SVMs are used when the data is not linearly separable. In non-linear SVMs, the data
is mapped to a higher-dimensional feature space where it becomes linearly separable. The
mapping is done using a kernel function, which transforms the data from the input space to a
higher-dimensional space. Once the data is mapped, a hyperplane is used to separate the
classes. The decision boundary is a curved line that separates the classes. Non-linear SVMs are
more powerful than linear SVMs, but they are also more computationally expensive.
6. Decision Tree
Decision Tree is a Supervised learning technique that can be used for both classification and
Regression problems, but mostly it is preferred for solving Classification problems. It is a tree-
structured classifier, where internal nodes represent the features of a dataset, branches
represent the decision rules and each leaf node represents the outcome.
In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node. Decision
nodes are used to make any decision and have multiple branches, whereas Leaf nodes are the
output of those decisions and do not contain any further branches.
The decisions or the test are performed on the basis of features of the given dataset.
It is a graphical representation for getting all the possible solutions to a problem/decision
based on given conditions.
It is called a decision tree because, similar to a tree, it starts with the root node, which expands
on further branches and constructs a tree-like structure.
In order to build a tree, we use the CART algorithm, which stands for Classification and
Regression Tree algorithm.
A decision tree simply asks a question, and based on the answer (Yes/No), it further split the
tree into subtrees.
Below diagram explains the general structure of a decision tree:
Root Node: Root node is from where the decision tree starts. It represents the entire dataset,
which further gets divided into two or more homogeneous sets.
Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated further
after getting a leaf node.
Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes
according to the given conditions.
Branch/Sub Tree: A tree formed by splitting the tree.
Pruning: Pruning is the process of removing the unwanted branches from the tree.
Parent/Child node: The root node of the tree is called the parent node, and other nodes are
called the child nodes.
In a decision tree, for predicting the class of the given dataset, the algorithm starts from the root node
of the tree. This algorithm compares the values of root attribute with the record (real dataset) attribute
and, based on the comparison, follows the branch and jumps to the next node.
For the next node, the algorithm again compares the attribute value with the other sub-nodes and move
further. It continues the process until it reaches the leaf node of the tree. The complete process can be
better understood using the below algorithm:
Step-1: Begin the tree with the root node, says S, which contains the complete dataset.
Step-2: Find the best attribute in the dataset using Attribute Selection Measure (ASM).
Step-3: Divide the S into subsets that contains possible values for the best attributes.
Step-4: Generate the decision tree node, which contains the best attribute.
Step-5: Recursively make new decision trees using the subsets of the dataset created in step -3.
Continue this process until a stage is reached where you cannot further classify the nodes and
called the final node as a leaf node.
Pruning is a process of deleting the unnecessary nodes from a tree in order to get the optimal decision
tree.
A too-large tree increases the risk of overfitting, and a small tree may not capture all the important
features of the dataset. Therefore, a technique that decreases the size of the learning tree without
reducing accuracy is known as Pruning. There are mainly two types of tree pruning technology used:
It is simple to understand as it follows the same process which a human follow while making any
decision in real-life.
It can be very useful for solving decision-related problems.
It helps to think about all the possible outcomes for a problem.
There is less requirement of data cleaning compared to other algorithms.
7. Statistical Learning
Statistical learning is a fundamental concept in artificial intelligence (AI) and machine learning (ML). It
refers to the process of developing algorithms and models that can automatically learn patterns and
make predictions or decisions from data. Statistical learning methods use statistical techniques to
analyze and interpret data, allowing machines to learn from examples and make informed decisions.
In AI, statistical learning is often used in supervised learning tasks, where the algorithm learns from
labeled training data to make predictions or classify new, unseen data points. The algorithm learns a
function or model that maps input features to output labels by optimizing a specific objective or loss
function. Examples of supervised learning algorithms include:
Linear Regression: Linear regression is used for predicting a continuous dependent variable
based on one or more independent variables. It fits a linear equation to the data by minimizing
the sum of squared differences between the observed and predicted values.
Logistic Regression: Logistic regression is used for binary classification problems. It models the
relationship between a set of independent variables and the probability of a binary outcome
using a logistic function.
Decision Trees: Decision trees are versatile algorithms used for both classification and regression
tasks. They split the data based on the values of input features to create a tree-like model that
can be used for predictions.
Support Vector Machines (SVM): SVM is a powerful algorithm used for both classification and
regression tasks. It finds an optimal hyperplane that separates the data into different classes
while maximizing the margin between the classes.
Naive Bayes: Naive Bayes is a probabilistic classifier based on Bayes' theorem. It assumes that
features are independent of each other, which is a naive assumption, but often performs well in
practice.
In statistical learning, the underlying assumption is that the observed data has some underlying
statistical properties and patterns. By using statistical techniques, such as probability theory,
optimization methods, and hypothesis testing, algorithms can uncover these patterns and make
predictions or inferences about future or unseen data.
Statistical learning also encompasses unsupervised learning, where the algorithm learns from unlabeled
data to discover patterns, structures, or relationships within the data. Clustering algorithms, such as k-
means clustering or hierarchical clustering, and dimensionality reduction techniques, such as principal
component analysis (PCA), are common examples of unsupervised learning algorithms.
1. Finance: Statistical learning is used for credit scoring, fraud detection, stock market prediction,
portfolio optimization, and risk assessment.
2. Healthcare: It is employed for medical image analysis, disease diagnosis, patient monitoring,
drug discovery, and personalized medicine.
3. Marketing and Advertising: Statistical learning helps in customer segmentation, market analysis,
targeted advertising, recommender systems, and customer churn prediction.
4. Natural Language Processing (NLP): Statistical learning techniques are used for sentiment
analysis, text classification, information extraction, machine translation, and speech recognition.
8. Regression and Classification
Regression and Classification algorithms are Supervised Learning algorithms. Both the algorithms are
used for prediction in Machine learning and work with the labeled datasets. But the difference between
both is how they are used for different machine learning problems.
Classification:
Classification is a process of finding a function which helps in dividing the dataset into classes based on
different parameters. In Classification, a computer program is trained on the training dataset and based
on that training, it categorizes the data into different classes.
The task of the classification algorithm is to find the mapping function to map the input(x) to the
discrete output(y).
Example: The best example to understand the Classification problem is Email Spam Detection. The
model is trained on the basis of millions of emails on different parameters, and whenever it receives a
new email, it identifies whether the email is spam or not. If the email is spam, then it is moved to the
Spam folder.
Logistic Regression
K-Nearest Neighbours
Support Vector Machines
Kernel SVM
Naïve Bayes
Decision Tree Classification
Random Forest Classification
Regression:
Regression is a process of finding the correlations between dependent and independent variables. It
helps in predicting the continuous variables such as prediction of Market Trends, prediction of House
prices, etc.
The task of the Regression algorithm is to find the mapping function to map the input variable(x) to the
continuous output variable(y).
Example: Suppose we want to do weather forecasting, so for this, we will use the Regression algorithm.
In weather prediction, the model is trained on the past data, and once the training is completed, it can
easily predict the weather for future days.
Supervised learning is a machine learning method in which models are trained using labeled data. In
supervised learning, models need to find the mapping function to map the input variable (X) with the
output variable (Y).
In supervised learning, models are trained using labelled dataset, where the model learns about each
type of data. Once the training process is completed, the model is tested on the basis of test data (a
subset of the training set), and then it predicts the output.
1. Regression
Regression algorithms are used if there is a relationship between the input variable and the output
variable. It is used for the prediction of continuous variables, such as Weather forecasting, Market
Trends, etc. Below are some popular Regression algorithms which come under supervised learning:
Linear Regression
Regression Trees
Non-Linear Regression
Bayesian Linear Regression
Polynomial Regression
2. Classification
Classification algorithms are used when the output variable is categorical, which means there are two
classes such as Yes-No, Male-Female, True-false, etc.
Spam Filtering,
Random Forest
Decision Trees
Logistic Regression
Support vector Machines
Example: Suppose we have an image of different types of fruits. The task of our supervised learning
model is to identify the fruits and classify them accordingly. So to identify the image in supervised
learning, we will give the input data as well as output for that, which means we will train the model by
the shape, size, color, and taste of each fruit. Once the training is completed, we will test the model by
giving the new set of fruit. The model will identify the fruit and predict the output using a suitable
algorithm.
Unsupervised Machine Learning:
Unsupervised learning is another machine learning method in which patterns inferred from the
unlabeled input data. The goal of unsupervised learning is to find the structure and patterns from the
input data. Unsupervised learning does not need any supervision. Instead, it finds patterns from the
data by its own.
Clustering: Clustering is a method of grouping the objects into clusters such that objects with
most similarities remains into a group and has less or no similarities with the objects of another
group. Cluster analysis finds the commonalities between the data objects and categorizes them
as per the presence and absence of those commonalities.
Association: An association rule is an unsupervised learning method which is used for finding
the relationships between variables in the large database. It determines the set of items that
occurs together in the dataset. Association rule makes marketing strategy more effective. Such
as people who buy X item (suppose a bread) are also tend to purchase Y (Butter/Jam) item. A
typical example of Association rule is Market Basket Analysis.
Example: So unlike supervised learning, here we will not provide any supervision to the model. We will
just provide the input dataset to the model and allow the model to find the patterns from the data. With
the help of a suitable algorithm, the model will train itself and divide the fruits into different groups
according to the most similar features between them.
NLP stands for Natural Language Processing, which is a part of Computer Science, Human language,
and Artificial Intelligence. It is the technology that is used by machines to understand, analyse,
manipulate, and interpret human's languages. It helps developers to organize knowledge for performing
tasks such as translation, automatic summarization, Named Entity Recognition (NER), speech
recognition, relationship extraction, and
Advantages of NLP
NLP helps users to ask questions about any subject and get a direct response within seconds.
NLP offers exact answers to the question means it does not offer unnecessary and unwanted
information.
NLP helps computers to communicate with humans in their languages.
It is very time efficient.
9. ANN
Artificial Neural Networks contain artificial neurons which are called units. These units are arranged in a
series of layers that together constitute the whole Artificial Neural Network in a system. A layer can
have only a dozen units or millions of units as this depends on how the complex neural networks will be
required to learn the hidden patterns in the dataset. Commonly, Artificial Neural Network has an input
layer, an output layer as well as hidden layers. The input layer receives data from the outside world
which the neural network needs to analyze or learn about. Then this data passes through one or
multiple hidden layers that transform the input into data that is valuable for the output layer. Finally,
the output layer provides an output in the form of a response of the Artificial Neural Networks to input
data provided.
In the majority of neural networks, units are interconnected from one layer to another. Each of these
connections has weights that determine the influence of one unit on another unit. As the data transfers
from one unit to another, the neural network learns more and more about the data which eventually
results in an output from the output layer.
The structures and operations of human neurons serve as the basis for artificial neural networks. It is
also known as neural networks or neural nets. The input layer of an artificial neural network is the first
layer, and it receives input from external sources and releases it to the hidden layer, which is the second
layer. In the hidden layer, each neuron receives input from the previous layer neurons, computes the
weighted sum, and sends it to the neurons in the next layer. These connections are weighted means
effects of the inputs from the previous layer are optimized more or less by assigning different-different
weights to each input and it is adjusted during the training process by optimizing these weights for
improved model performance.
Structure: The structure of artificial neural networks is inspired by biological neurons. A biological
neuron has a cell body or soma to process the impulses, dendrites to receive them, and an axon that
transfers them to other neurons. The input nodes of artificial neural networks receive input signals, the
hidden layer nodes compute these input signals, and the output layer nodes compute the final output
by processing the hidden layer’s results using activation functions.
Synapses: Synapses are the links between biological neurons that enable the transmission of impulses
from dendrites to the cell body. Synapses are the weights that join the one-layer nodes to the next-layer
nodes in artificial neurons. The strength of the links is determined by the weight value.
Learning: In biological neurons, learning happens in the cell body nucleus or soma, which has a nucleus
that helps to process the impulses. An action potential is produced and travels through the axons if the
impulses are powerful enough to reach the threshold. This becomes possible by synaptic plasticity,
which represents the ability of synapses to become stronger or weaker over time in reaction to changes
in their activity. In artificial neural networks, backpropagation is a technique used for learning, which
adjusts the weights between nodes according to the error or differences between predicted and actual
outcomes.
Activation: In biological neurons, activation is the firing rate of the neuron which happens when the
impulses are strong enough to reach the threshold. In artificial neural networks, A mathematical
function known as an activation function maps the input to the output, and executes activations.
Feedforward Neural Network: The feedforward neural network is one of the most basic artificial neural
networks. In this ANN, the data or the input provided travels in a single direction. It enters into the ANN
through the input layer and exits through the output layer while hidden layers may or may not exist. So
the feedforward neural network has a front-propagated wave only and usually does not have
backpropagation.
Convolutional Neural Network: A Convolutional neural network has some similarities to the feed-
forward neural network, where the connections between units have weights that determine the
influence of one unit on another unit. But a CNN has one or more than one convolutional layer that uses
a convolution operation on the input and then passes the result obtained in the form of output to the
next layer. CNN has applications in speech and image processing which is particularly useful in computer
vision.
Modular Neural Network: A Modular Neural Network contains a collection of different neural networks
that work independently towards obtaining the output with no interaction between them. Each of the
different neural networks performs a different sub-task by obtaining unique inputs compared to other
networks. The advantage of this modular neural network is that it breaks down a large and complex
computational process into smaller components, thus decreasing its complexity while still obtaining the
required output.
Radial basis function Neural Network: Radial basis functions are those functions that consider the
distance of a point concerning the center. RBF functions have two layers. In the first layer, the input is
mapped into all the Radial basis functions in the hidden layer and then the output layer computes the
output in the next step. Radial basis function nets are normally used to model the data that represents
any underlying trend or function.
Recurrent Neural Network: The Recurrent Neural Network saves the output of a layer and feeds this
output back to the input to better predict the outcome of the layer. The first layer in the RNN is quite
similar to the feed-forward neural network and the recurrent neural network starts once the output of
the first layer is computed. After this layer, each unit will remember some information from the
previous step so that it can act as a memory cell in performing computations.
Applications of Artificial Neural Networks
Social Media: Artificial Neural Networks are used heavily in Social Media. For example, let’s take the
‘People you may know’ feature on Facebook that suggests people that you might know in real life so
that you can send them friend requests. Well, this magical effect is achieved by using Artificial Neural
Networks that analyze your profile, your interests, your current friends, and also their friends and
various other factors to calculate the people you might potentially know. Another common application
of Machine Learning in social media is facial recognition. This is done by finding around 100 reference
points on the person’s face and then matching them with those already available in the database using
convolutional neural networks.
Marketing and Sales: When you log onto E-commerce sites like Amazon and Flipkart, they will
recommend your products to buy based on your previous browsing history. Similarly, suppose you love
Pasta, then Zomato, Swiggy, etc. will show you restaurant recommendations based on your tastes and
previous order history. This is true across all new-age marketing segments like Book sites, Movie
services, Hospitality sites, etc. and it is done by implementing personalized marketing. This uses Artificial
Neural Networks to identify the customer likes, dislikes, previous shopping history, etc., and then tailor
the marketing campaigns accordingly.
Healthcare: Artificial Neural Networks are used in Oncology to train algorithms that can identify
cancerous tissue at the microscopic level at the same accuracy as trained physicians. Various rare
diseases may manifest in physical characteristics and can be identified in their premature stages by
using Facial Analysis on the patient photos. So the full-scale implementation of Artificial Neural
Networks in the healthcare environment can only enhance the diagnostic abilities of medical experts
and ultimately lead to the overall improvement in the quality of medical care all over the world.
Personal Assistants: I am sure you all have heard of Siri, Alexa, Cortana, etc., and also heard them based
on the phones you have!!! These are personal assistants and an example of speech recognition that uses
Natural Language Processing to interact with the users and formulate a response accordingly. Natural
Language Processing uses artificial neural networks that are made to handle many tasks of these
personal assistants such as managing the language syntax, semantics, correct speech, the conversation
that is going on, etc.
10. Feed Forward Neural Network
"The process of receiving an input to produce some kind of output to make some kind of prediction is
known as Feed Forward." Feed Forward neural network is the core of many other important neural
networks such as convolution neural network.
In the feed-forward neural network, there are not any feedback loops or connections in the network.
Here is simply an input layer, a hidden layer, and an output layer.
• Input Layer:
The input layer accepts the input data and passes it to the next layer.
• Hidden Layers:
One or more hidden layers that process and transform the input data. Each hidden layer has a
set of neurons connected to the neurons of the previous and next layers. These layers use
activation functions, such as ReLU or sigmoid, to introduce non-linearity into the network,
allowing it to learn and model more complex relationships between the inputs and outputs.
• Output Layer:
The output layer generates the final output. Depending on the type of problem, the number of
neurons in the output layer may vary. For example, in a binary classification problem, it would
only have one neuron. In contrast, a multi-class classification problem would have as many
neurons as the number of classes.
The purpose of Neural networks feedforward is to approximate certain functions. The input to the
network is a vector of values, x, which is passed through the network, layer by layer, and transformed
into an output, y. The network's final output predicts the target function for the given input. The
network makes this prediction using a set of parameters, θ (theta), adjusted during training to minimize
the error between the network's predictions and the target function.
The training involves adjusting the �θ (theta) values to minimize errors. This is done by presenting the
network with a set of input-output pairs (also called training data) and computing the error between
the network's prediction and the true output for each pair. This error is then used to compute
the gradient of the error concerning the parameters, which tells us how to adjust the parameters to
reduce the error. This is done using optimization techniques like gradient descent. Once the training
process is completed, the network has " learned " the function and can be used to predict new input.
Finally, the network stores this optimal value of �θ (theta) in its memory, so it can use it to predict new
inputs.
• I:
Input node (the starting point for data entering the neural network)
• W:
Connection weight (used to determine the strength of the connection between nodes)
• H:
Hidden node (a layer within the network that processes input)
• HA:
Activated hidden node (the value of the hidden node after passing through a predefined
function)
• O:
Output node (the final output of the network, calculated as a weighted sum of the last hidden
layer)
• OA:
Activated output node (the final output of the network after passing through a predefined
function)
• B:
Bias node (a constant value, typically set to 1.0, used to adjust the output of the network)
1. Input Layer:
2. Hidden Layer:
3. Output Layer:
Activation Function:
- Common Types:
- Sigmoid: Maps to values between 0 and 1 (for binary classification).
- ReLU: Outputs input for positive values, 0 for negatives (common and computationally efficient).
- Usage: Each neuron can have its activation function, but common to use the same function for all
neurons in a layer. In some cases, a linear activation function is used for regression problems in the
output layer.
Training
To train a neural networks feedforward, the following steps are typically followed:
1. NLP
NLP stands for Natural Language Processing, which is a part of Computer Science, Human language,
and Artificial Intelligence. It is the technology that is used by machines to understand, analyse,
manipulate, and interpret human's languages. It helps developers to organize knowledge for performing
tasks such as translation, automatic summarization, Named Entity Recognition (NER), speech
recognition, relationship extraction, and topic segmentation.
Advantages of NLP
NLP helps users to ask questions about any subject and get a direct response within seconds.
NLP offers exact answers to the question means it does not offer unnecessary and unwanted
information.
NLP helps computers to communicate with humans in their languages.
It is very time efficient.
Most of the companies use NLP to improve the efficiency of documentation processes, accuracy
of documentation, and identify the information from large databases.
Disadvantages of NLP
Components of NLP
Natural Language Understanding (NLU) helps the machine to understand and analyse human language
by extracting the metadata from content such as concepts, entities, keywords, emotion, relations, and
semantic roles.
NLU mainly used in Business applications to understand the customer's problem in both spoken and
written language.
Natural Language Generation (NLG) acts as a translator that converts the computerized data into natural
language representation. It mainly involves Text planning, Sentence planning, and Text Realization.
Applications of NLP
1. Question Answering
2. Spam Detection
3. Sentiment Analysis
4. Machine Translation
Sentence Segment is the first step for building the NLP pipeline. It breaks the paragraph into separate
sentences.
Independence Day is one of the important festivals for every Indian citizen. It is celebrated on the
15th of August each year ever since India got independence from the British rule. The day celebrates
independence in the true sense.
1. "Independence Day is one of the important festivals for every Indian citizen."
2. "It is celebrated on the 15th of August each year ever since India got independence from the
British rule."
3. "This day celebrates independence in the true sense."
Word Tokenizer is used to break the sentence into separate words or tokens.
Example:
JavaTpoint offers Corporate Training, Summer Training, Online Training, and Winter Training.
Step3: Stemming
Stemming is used to normalize words into its base form or root form. For example, celebrates,
celebrated and celebrating, all these words are originated with a single root word "celebrate." The big
problem with stemming is that sometimes it produces the root word which may not have any meaning.
For Example, intelligence, intelligent, and intelligently, all these words are originated with a single root
word "intelligen." In English, the word "intelligen" do not have any meaning.
Step 4: Lemmatization
Lemmatization is quite similar to the Stamming. It is used to group different inflected forms of the word,
called Lemma. The main difference between Stemming and lemmatization is that it produces the root
word, which has a meaning.
For example: In lemmatization, the words intelligence, intelligent, and intelligently has a root word
intelligent, which has a meaning.
In English, there are a lot of words that appear very frequently like "is", "and", "the", and "a". NLP
pipelines will flag these words as stop words. Stop words might be filtered out before doing any
statistical analysis.
Note: When you are building a rock band search engine, then you do not ignore the word "The."
POS stands for parts of speech, which includes Noun, verb, adverb, and Adjective. It indicates that how a
word functions with its meaning as well as grammatically within the sentences. A word has one or more
parts of speech based on the context in which it is used.
Named Entity Recognition (NER) is the process of detecting the named entity such as person name,
movie name, organization name, or location.
Example: Steve Jobs introduced iPhone at the Macworld Conference in San Francisco, California.
Step 9: Chunking
Chunking is used to collect the individual piece of information and grouping them into bigger pieces of
sentences.
Phases of NLP
The first phase of NLP is the Lexical Analysis. This phase scans the source code as a stream of characters
and converts it into meaningful lexemes. It divides the whole text into paragraphs, sentences, and
words.
Syntactic Analysis is used to check grammar, word arrangements, and shows the relationship among the
words.
In the real world, Agra goes to the Poonam, does not make any sense, so this sentence is rejected by the
Syntactic analyzer.
3. Semantic Analysis
Semantic analysis is concerned with the meaning representation. It mainly focuses on the literal
meaning of words, phrases, and sentences.
4. Discourse Integration
Discourse Integration depends upon the sentences that proceeds it and also invokes the meaning of the
sentences that follow it.
5. Pragmatic Analysis
Pragmatic is the fifth and last phase of NLP. It helps you to discover the intended effect by applying a set
of rules that characterize cooperative dialogues.
Information Retrieval (IR) can be defined as a software program that deals with the organization,
storage, retrieval, and evaluation of information from document repositories, particularly textual
information. Information Retrieval is the activity of obtaining material that can usually be documented
on an unstructured nature i.e. usually text which satisfies an information need from within large
collections which is stored on computers. For example, Information Retrieval can be when a user enters
a query into the system.
An IR system has the ability to represent, store, organize, and access information items. A set of
keywords are required to search. Keywords are what people are searching for in search engines. These
keywords summarize the description of the information.
What is an IR Model?
An Information Retrieval (IR) model selects and ranks the document that is required by the user or the
user has asked for in the form of a query. The documents and the queries are represented in a similar
manner, so that document selection and ranking can be formalized by a matching function that returns a
retrieval status value (RSV) for each document in the collection. Many of the Information Retrieval
systems represent document contents by a set of descriptors, called terms, belonging to a vocabulary V.
An IR model determines the query-document matching function according to four main approaches:
Acquisition: In this step, the selection of documents and other objects from various web
resources that consist of text-based documents takes place. The required data is collected by web
crawlers and stored in the database.
Representation: It consists of indexing that contains free-text terms, controlled vocabulary,
manual & automatic techniques as well. example: Abstracting contains summarizing and
Bibliographic description that contains author, title, sources, data, and metadata.
File Organization: There are two types of file organization methods. i.e. Sequential: It contains
documents by document data. Inverted: It contains term by term, list of records under each term.
Combination of both.
Query: An IR process starts when a user enters a query into the system. Queries are formal
statements of information needs, for example, search strings in web search engines. In
information retrieval, a query does not uniquely identify a single object in the collection. Instead,
several objects may match the query, perhaps with different degrees of relevancy.
User Interaction With Information Retrieval System
The information first is supposed to be translated into a query by the user. In the information retrieval
system, there is a set of words that convey the semantics of the information that is required whereas, in
a data retrieval system, a query expression is used to convey the constraints which are satisfied by the
objects. Example: A user wants to search for something but ends up searching with another thing. This
means that the user is browsing and not searching. The above figure shows the interaction of the user
through different tasks.
3. Information Extraction
Information Extraction’s main goal is to find out meaningful information from the document set. IE is
one type of IR. IE automatically gets structured information from a set of unstructured documents or
corpus. IE focuses more on texts that can be read and written by humans and utilize them with NLP
(natural language processing). But information retrieval system finds information that is relevant to the
user’s information need and that is stored into a computer. It returns documents of text (unstructured
form) from a large set of corpses.
The information extraction system used in online text extraction should come at a low cost. It needs to
have flexibility in development and must have an easy conversion to new domains. Let’s take the natural
language processing of the machine as an example, i.e. Here IE (information extraction) is able to
recognize the IR system of a person’s need. Using information extraction, we want to make a machine
capable of extracting structured information from documents. The importance of an information
extraction system is determined by the growing amount of information available in unstructured form
(data without metadata), like on the Internet. This knowledge can be made more accessible utilizing
transformation into relational form, or by marking-up with XML tags.
We always try to use automated learning systems in information extraction and we always use this. This
type of IE system will decrease the faults in information extraction. This will also reduce dependencies
on a domain by diminishing the requirement for supervision. IE of structured information relies on the
basic content management principle: “Content must be in context to have value “. Information
Extraction is difficult than Information Retrieval.
4. Machine Translation
Machine translation of languages refers to the use of artificial intelligence (AI) and machine learning
algorithms to automatically translate text or speech from one language to another. This technology has
been developed over the years and has become increasingly sophisticated, with the ability to produce
accurate translations across a wide range of languages.
Machine translation (MT) is a subfield of natural language processing (NLP) that focuses on
automatically translating text or speech from one language to another. It involves the use of
computational algorithms and models to enable computers to understand the structure, meaning, and
context of the source language and generate equivalent translations in the target language.
1. Rule-Based Machine Translation (RBMT): RBMT relies on a set of linguistic rules and dictionaries that
are manually created by human experts. These rules define the grammatical and syntactic structures of
both the source and target languages. RBMT systems often require extensive linguistic knowledge and
rule engineering, making them labor-intensive to develop and maintain.
2. Statistical Machine Translation (SMT): SMT relies on statistical models that are trained on large
parallel corpora, which are collections of aligned source and target language texts. These models learn
the probabilities of word or phrase translations based on their co-occurrence patterns in the training
data. SMT models use techniques like phrase-based translation and language models to generate
translations. They are flexible and can handle complex language phenomena but may struggle with
translating rare or unseen phrases.
3. Neural Machine Translation (NMT): NMT is an advanced approach that utilizes neural networks,
specifically recurrent neural networks (RNNs) or transformers, to translate text. NMT models learn an
end-to-end mapping between source and target language sequences, allowing them to capture long-
range dependencies and generate fluent translations. NMT has shown significant improvements over
traditional approaches and is currently the dominant method in machine translation research and
applications.
4. Hybrid Approaches: Hybrid approaches combine the strengths of different machine translation
techniques. For example, a hybrid system may use rule-based methods to handle specific linguistic rules
and exceptions, while employing statistical or neural models for general translation tasks.
Machine translation faces several challenges, including ambiguity, idiomatic expressions, word sense
disambiguation, and handling language-specific nuances. Translating accurately and capturing the
intended meaning can be particularly challenging when dealing with languages that have different word
orders or grammatical structures.
Despite these challenges, machine translation has made significant advancements over the years and is
widely used for various applications, including web page translation, multilingual customer support,
localization of software and content, and language accessibility. Machine translation systems continue
to improve with the availability of larger parallel corpora, advancements in neural network
architectures, and the integration of techniques such as transfer learning and reinforcement learning.
i. Syntactic Analysis
Syntactic analysis or parsing or syntax analysis is the third phase of NLP. The purpose of this phase is to
draw exact meaning, or you can say dictionary meaning from the text. Syntax analysis checks the text for
meaningfulness comparing to the rules of formal grammar. For example, the sentence like “hot ice-
cream” would be rejected by semantic analyzer.
In this sense, syntactic analysis or parsing may be defined as the process of analyzing the strings of
symbols in natural language conforming to the rules of formal grammar. The origin of the word ‘parsing’
is from Latin word ‘pars’ which means ‘part’.
Concept of Parser
It is used to implement the task of parsing. It may be defined as the software component designed for
taking input data (text) and giving structural representation of the input after checking for correct syntax
as per formal grammar. It also builds a data structure generally in the form of parse tree or abstract
syntax tree or other hierarchical structure.
The main roles of the parse include −
Types of Parsing
Top-down Parsing
Bottom-up Parsing
Top-down Parsing
In this kind of parsing, the parser starts constructing the parse tree from the start symbol and then tries
to transform the start symbol to the input. The most common form of topdown parsing uses recursive
procedure to process the input. The main disadvantage of recursive descent parsing is backtracking.
Bottom-up Parsing
In this kind of parsing, the parser starts with the input symbol and tries to construct the parser tree up
to the start symbol.
The semantic analysis looks after the meaning. It allocates the meaning to all the structures built
by the syntactic analyzer. Then every syntactic structure and the objects are mapped together into
the task domain. If mapping is possible the structure is sent, if not then it is rejected. For
example, “hot ice-cream” will give a semantic error. During semantic analysis two main
operations are executed:
First, each separate word will be mapped with appropriate objects in the database. The
dictionary meaning of every word will be found. A word might have more than one
meaning.
Secondly, all the meanings of each different word will be integrated to find a proper
correlation between the word structures. This process of determining the correct meaning
is called lexical disambiguation. It is done by associating each word with the context.
This process defined above can be used to determine the partial meaning of a sentence. However
semantic and syntax are two completely contrasting concepts. It might be possible that a
syntactically correct sentence is semantically incorrect.
For example, “A rock smelled the colour nine.” It is syntactically correct as it obeys all the rules
of English, but is semantically incorrect. The semantic analysis verifies that a sentence is abiding
by the rules and creates correct information.
While performing the morphological analysis, each particular word is analyzed. Non-word
tokens such as punctuation are removed from the words. Hence the remaining words are
assigned categories. For instance, Ram’s iPhone cannot convert the video from .mkv to .mp4. In
Morphological analysis, word by word the sentence is analyzed.
So here, Ram is a proper noun, Ram’s is assigned as possessive suffix and .mkv and .mp4 is
assigned as a file extension.
As shown above, the sentence is analyzed word by word. Each word is assigned a syntactic
category. The file extensions are also identified present in the sentence which is behaving as an
adjective in the above example. In the above example, the possessive suffix is also identified.
This is a very important step as the judgement of prefixes and suffixes will depend on a syntactic
category for the word. For example, swims and swim’s are different. One makes it plural, while
the other makes it a third-person singular verb. If the prefix or suffix is incorrectly interpreted
then the meaning and understanding of the sentence are completely changed. The interpretation
assigns a category to the word. Hence, discard the uncertainty from the word.
6. Machine Learning
Machine learning algorithms create a mathematical model that, without being explicitly programmed,
aids in making predictions or decisions with the assistance of sample historical data, or training data. For
the purpose of developing predictive models, machine learning brings together statistics and computer
science. Algorithms that learn from historical data are either constructed or utilized in machine learning.
The performance will rise in proportion to the quantity of information we provide.
A machine can learn if it can gain more data to improve its performance.
A machine learning system builds prediction models, learns from previous data, and predicts the output
of new data whenever it receives it. The amount of data helps to build a better model that accurately
predicts the output, which in turn affects the accuracy of the predicted output.
The demand for machine learning is steadily rising. Because it is able to perform tasks that are too complex
for a person to directly implement, machine learning is required. Humans are constrained by our inability
to manually access vast amounts of data; as a result, we require computer systems, which is where
machine learning comes in to simplify our lives.
By providing them with a large amount of data and allowing them to automatically explore the data, build
models, and predict the required output, we can train machine learning algorithms. The cost function can
be used to determine the amount of data and the machine learning algorithm's performance. We can
save both time and money by using machine learning.
The significance of AI can be handily perceived by its utilization's cases, Presently, AI is utilized in self-
driving vehicles, digital misrepresentation identification, face acknowledgment, and companion idea by
Facebook, and so on. Different top organizations, for example, Netflix and Amazon have constructed AI
models that are utilizing an immense measure of information to examine the client interest and suggest
item likewise.
Following are some key points which show the importance of Machine Learning:
PageRank (PR) is an algorithm used by Google Search to rank websites in their search engine results.
PageRank was named after Larry Page, one of the founders of Google. PageRank is a way of measuring
the importance of website pages. According to Google:
PageRank works by counting the number and quality of links to a page to determine a rough estimate of
how important the website is. The underlying assumption is that more important websites are likely to
receive more links from other websites.
It is not the only algorithm used by Google to order search engine results, but it is the first algorithm
that was used by the company, and it is the best-known.
Algorithm
The PageRank algorithm outputs a probability distribution used to represent the likelihood that a person
randomly clicking on links will arrive at any particular page. PageRank can be calculated for collections of
documents of any size. It is assumed in several research papers that the distribution is evenly divided
among all documents in the collection at the beginning of the computational process. The PageRank
computations require several passes, called “iterations”, through the collection to adjust approximate
PageRank values to more closely reflect the theoretical true value.
Simplified algorithm
1. Initialization:
2. Link Structure:
- Ignore links from a page to itself or multiple links from one page to another.
- If a page links to other pages, the PageRank it transfers to its targets is divided equally among its
outbound links.
3. Iteration 1:
- If only pages B, C, and D link to A, each transfers 0.25 PageRank to A, summing up to 0.75 for A.
4. Iteration 2:
- If page B has links to pages A and C, it transfers half of its PageRank (0.125 each) to A and C.
- Page D, with links to A, B, and C, transfers one-third of its PageRank (approximately 0.083) to A.
5. Final PageRank:
- After iterations, the PageRank for each page stabilizes.
- Page A's final PageRank is approximately 0.458, considering the contributions from B, C, and D.
6. Generalization:
- PageRank for any page is the sum of contributions from pages linking to it, divided by the number of
outbound links from each contributing page.
- It's like each page distributes its importance to the pages it links to, sharing equally among its links.
7. Damping Factor:
- Introduce a damping factor to simulate the probability that a user randomly clicks a link on a page
rather than following the links on the page.
- The damping factor is like a tax on the PageRank, ensuring that not all PageRank is transferred, and
some "evaporates" in the process.
In essence, the algorithm models the idea that more important pages, as determined by their incoming
links, contribute more to the PageRank of the pages they link to. The process iterates until the PageRank
values stabilize, providing a measure of the relative importance of each page in the given web graph.