Knowledge Inference AI
Knowledge Inference AI
KNOWLEDGE INFERENCE
In simple terms, AI inference involves making predictions or decisions based on previously trained
models and input data. The significance of AI inference is vast, touching various sectors and
revolutionizing the way we approach problem-solving and decision-making.
In simple terms, AI inference involves making predictions or decisions based on previously trained
models and input data. The significance of AI inference is vast, touching various sectors and
revolutionizing the way we approach problem-solving and decision-making.
OR
Knowledge inference refers to acquiring new knowledge from existing facts based on certain rules and
constraints. One way of representing these rules and constraints is through the use of logic rules,
formally known as knowledge representation. The mechanism behind inferring new knowledge based
on the existing facts and logic rules is typically known as reasoning.
The frame concept was introduced by Minsky in 1974 and is foundational in the field of knowledge
representation.
Inference rules:
Inference rules are the templates for generating valid arguments. Inference rules are applied to derive
proofs in artificial intelligence, and the proof is a sequence of the conclusion that leads to the desired
goal.
In inference rules, the implication among all the connectives plays an important role. Following are
some terminologies related to inference rules:
o Implication: It is one of the logical connectives which can be represented as P → Q. It is a
Boolean expression.
o Converse: The converse of implication, which means the right-hand side proposition goes to
the left-hand side and vice-versa. It can be written as Q → P.
o Contrapositive: The negation of converse is termed as contrapositive, and it can be
represented as ¬ Q → ¬ P.
o Inverse: The negation of implication is called inverse. It can be represented as ¬ P → ¬ Q.
From the above term some of the compound statements are equivalent to each other, which we can
prove using truth table:
Hence from the above truth table, we can prove that P → Q is equivalent to ¬ Q → ¬ P, and Q→ P is
equivalent to ¬ P → ¬ Q.
Example:
Statement-1: "If I am sleepy then I go to bed" ==> P→ Q
Statement-2: "I am sleepy" ==> P
Conclusion: "I go to bed." ==> Q.
Hence, we can say that, if P→ Q is true and P is true then Q will be true.
Proof by Truth table:
2. Modus Tollens:
The Modus Tollens rule state that if P→ Q is true and ¬ Q is true, then ¬ P will also true. It can be
represented as:
3. Hypothetical Syllogism:
The Hypothetical Syllogism rule state that if P→R is true whenever P→Q is true, and Q→R is true. It can
be represented as the following notation:
Example:
[Type here] [Type here] [Type here]
KNOWLEDGE INTERFERENCE
Statement-1: If you have my home key then you can unlock my home. P→Q
Statement-2: If you can unlock my home then you can take my money. Q→R
Conclusion: If you have my home key then you can take my money. P→R
Proof by truth table:
4. Disjunctive Syllogism:
The Disjunctive syllogism rule state that if P∨Q is true, and ¬P is true, then Q will be true. It can be
represented as:
Example:
Statement-1: Today is Sunday or Monday. ==>P∨Q
Statement-2: Today is not Sunday. ==> ¬P
Conclusion: Today is Monday. ==> Q
Proof by truth-table:
5. Addition:
The Addition rule is one the common inference rule, and it states that If P is true, then P∨Q will be
true.
Example:
6. Simplification:
The simplification rule state that if P∧ Q is true, then Q or P will also be true. It can be represented as:
Proof by Truth-Table:
7. Resolution:
The Resolution rule state that if P∨Q and ¬ P∧R is true, then Q∨R will also be true. It can be
represented as
Proof by Truth-Table:
Applications of Inference in AI
1. Medical Research and Diagnoses: AI aids in medical research and diagnoses by analyzing
patient data to provide optimized treatment plans and prognoses.
2. Recommendation Systems and Personalized Advertisements: E-commerce platforms
utilize inference to suggest products based on user preferences, enhancing user experience and
engagement.
3. Self-Driving Vehicles: Inference enables self-driving cars to interpret sensor data and navigate
through dynamic environments safely and efficiently.
FRAME BASED SYSTEM
Frames are data structures used in AI to represent stereotypical situations or scenarios. They
encapsulate information about objects, events, and their interrelationships within a particular context.
Each frame consists of a set of attributes and values, forming a template for understanding specific
situations.
Key Components of Frames
Frames are essential for structuring knowledge in AI, and understanding their key components helps
in effectively utilizing them.
Here are the main components of frames, along with examples to illustrate their use:
1. Slots
Slots are attributes or properties of a frame. They represent the different aspects or characteristics of
the frame’s concept.
Example: For a “Person” frame, slots might include:
• Name: The individual’s name
• Age: The individual’s age
• Occupation: The individual’s profession
• Address: The individual’s home address
2. Facets
Facets provide additional details or constraints for slots, defining acceptable values or specifying how
slots should be used.
Example: For the “Age” slot in the “Person” frame:
• Type: Integer
• Range: 0 to 120
• Default Value: 30
3. Default Values
Default values are predefined values assigned to slots if no specific value is provided. They offer a
baseline that can be overridden with more specific information.
Frame inheritance is a method used in knowledge representation systems to manage and organize
information efficiently. It allows one frame (child) to inherit attributes and properties from another
frame (parent), creating a hierarchical structure. This method facilitates the reuse and extension of
existing knowledge.
Key Concepts of Frame Inheritance
1. Parent Frame: The frame from which attributes and properties are inherited. It defines
general attributes that are common to all its child frames.
2. Child Frame: The frame that inherits attributes and properties from the parent frame. It can
add new attributes or override existing ones to represent more specific information.
3. Inheritance Hierarchy: A tree-like structure where frames are organized hierarchically. Each
child frame can inherit from multiple parent frames, forming a network of relationships.
4. Overriding: When a child frame modifies or replaces an attribute inherited from the parent
frame with a more specific value or definition.
5. Extension: Adding new attributes or properties to a child frame that are not present in the
parent frame.
How Frame Inheritance Works?
1. Define Parent Frame: Create a general frame with common attributes. For example, a
“Vehicle” frame might include attributes like “Make,” “Model,” and “Year.”
2. Create Child Frame: Define a more specific frame that inherits from the parent frame. For
example, a “Car” frame might inherit attributes from the “Vehicle” frame and add specific
attributes like “Number of Doors.”
3. Use Inherited Attributes: The child frame automatically includes all attributes from the
parent frame, providing a structured way to build on existing knowledge.
4. Override or Extend: Modify or add attributes in the child frame as needed to refine the
representation. For example, the “Car” frame might override the “Year” attribute to specify a
range of acceptable values.
Example of Frame Inheritance
Let’s consider an example with a hierarchy of frames in a library system:
• Parent Frame: “LibraryItem”
o Attributes:
o Title
o Author
o Publication Year
• Child Frame 1: “Book” (inherits from “LibraryItem”)
o Inherited Attributes: Title, Author, Publication Year
o Extended Attributes:
o ISBN
o Genre
• Child Frame 2: “Magazine” (inherits from “LibraryItem”)
o Inherited Attributes: Title, Author, Publication Year
o Extended Attributes:
o Issue Number
o Publisher
In this example:
• The “Book” frame inherits the common attributes from the “LibraryItem” frame and adds
specific attributes related to books.
• The “Magazine” frame also inherits from “LibraryItem” but adds attributes specific to
magazines.
Applications of Frames in AI
1. Natural Language Processing (NLP): In NLP, frames are used to understand the context of
words and sentences. For example, a “booking” frame might be used to interpret requests for
reservations, extracting relevant information such as date, time, and number of people.
2. Expert Systems: Expert systems use frames to represent knowledge about specific domains.
For instance, a medical diagnosis system might employ frames to represent various diseases,
symptoms, and treatment options.
3. Robotics: Frames help robots make sense of their environment by providing structured
information about objects and their properties. This allows robots to perform tasks such as
object recognition and manipulation.
4. Cognitive Modeling: Frames are used in cognitive modeling to simulate human thought
processes. By representing knowledge in frames, researchers can create models that mimic
human reasoning and decision-making.
Advantages of Using Frames
• Organized Knowledge: Frames help in structuring information in a way that mirrors real-
world scenarios, making it easier for AI systems to understand and process.
• Flexibility: Frames can be easily modified or extended to incorporate new information or
adapt to changing contexts.
• Reusability: Once defined, frames can be reused across different applications or scenarios,
promoting consistency and efficiency.
Challenges and Limitations
• Complexity: As the number of frames and their interrelationships increase, managing and
maintaining the frames can become complex.
• Context Sensitivity: Frames may struggle to adapt to highly dynamic or ambiguous situations
where predefined structures may not fit.
• Scalability: For large-scale systems, the sheer volume of frames and their interactions can pose
challenges in terms of performance and resource management.
INTERFERENCE ENGINE
An inference engine is a key component of an expert system, one of the earliest types of artificial
intelligence. An expert system applies logical rules to data to deduce new information. The primary
function of an inference engine is to infer information based on a set of rules and data.
a. Forward chaining
b. Backward chaining
FORWARD CHAINING
Forward chaining is also known as a forward deduction or forward reasoning method when using an
inference engine. Forward chaining is a form of reasoning which start with atomic sentences in the
knowledge base and applies inference rules (Modus Ponens) in the forward direction to extract more
data until a goal is reached.
The Forward-chaining algorithm starts from known facts, triggers all rules whose premises are
satisfied, and add their conclusion to the known facts. This process repeats until the problem is solved.
Properties of Forward-Chaining:
Consider the following famous example which we will use in both approaches:
Example:
"As per the law, it is a crime for an American to sell weapons to hostile nations. Country nano, an
enemy of America, has some missiles, and all the missiles were sold to it by Robert, who is an
American citizen."
To solve the above problem, first, we will convert all the above facts into first-order definite clauses,
and then we will use a forward-chaining algorithm to reach the goal.
o It is a crime for an American to sell weapons to hostile nations. (Let's say x, y, and z are
variables)
American (x) ∧ weapon(y) ∧ sells (x,y,z) ∧ hostile(z) → Criminal(x) ...(1)
o Country nano has some missiles. ?x Owns(nano, x) ∧ Missile(x). It can be written in two definite
clauses by using Existential Instantiation, introducing new Constant x.
Owns(nano, x) ......(2)
Missile(x) .......(3)
o All of the missiles were sold to country nano by Robert.
x Missiles(x) ∧ Owns (nano, x) → Sells (Robert, x, nano) ......(4)
o Missiles are weapons.
Missile(x) → Weapons (x) .......(5)
o Enemy of America is known as hostile.
Enemy(x, America) →Hostile(x) ........(6)
o Country nano is an enemy of America.
Enemy (nano, America) .........(7)
o Robert is American
American(Robert). ..........(8)
Backward Chaining
(West=Robert)
FUZZY REASONING
[Type here] [Type here] [Type here]
KNOWLEDGE INTERFERENCE
Fuzzy logic contains the multiple logical values and these values are the truth values of a variable or
problem between 0 and 1. This concept was introduced by Lofti Zadeh in 1965 based on the Fuzzy Set
Theory. This concept provides the possibilities which are not given by computers, but similar to the
range of possibilities generated by humans.
In the Boolean system, only two possibilities (0 and 1) exist, where 1 denotes the absolute truth value
and 0 denotes the absolute false value. But in the fuzzy system, there are multiple possibilities present
between the 0 and 1, which are partially false and partially true.
The Fuzzy logic can be implemented in systems such as micro-controllers, workstation-based or large
network-based systems for achieving the definite output. It can also be implemented in both hardware
or software.
Characteristics of Fuzzy Logic
Following are the characteristics of fuzzy logic:
1. This concept is flexible and we can easily understand and implement it.
2. It is used for helping the minimization of the logics created by the human.
3. It is the best method for finding the solution of those problems which are suitable for
approximate or uncertain reasoning.
4. It always offers two values, which denote the two possible solutions for a problem and
statement.
5. It allows users to build or create the functions which are non-linear of arbitrary complexity.
6. In fuzzy logic, everything is a matter of degree.
7. In the Fuzzy logic, any system which is logical can be easily fuzzified.
8. It is based on natural language processing.
9. It is also used by the quantitative analysts for improving their algorithm's execution.
10. It also allows users to integrate with the programming.
Architecture of a Fuzzy Logic System
In the architecture of the Fuzzy Logic system, each component plays an important role. The
architecture consists of the different four components which are given below.
1. Rule Base
2. Fuzzification
3. Inference Engine
4. Defuzzification
Following diagram shows the architecture or process of a Fuzzy Logic system:
1. Rule Base
Rule Base is a component used for storing the set of rules and the If-Then conditions given by the
experts are used for controlling the decision-making systems. There are so many updates that come in
the Fuzzy theory recently, which offers effective methods for designing and tuning of fuzzy controllers.
These updates or developments decreases the number of fuzzy set of rules.
2. Fuzzification
Fuzzification is a module or component for transforming the system inputs, i.e., it converts the crisp
number into fuzzy steps. The crisp numbers are those inputs which are measured by the sensors and
then fuzzification passed them into the control systems for further processing. This component
divides the input signals into following five states in any Fuzzy Logic system:
o Large Positive (LP)
o Medium Positive (MP)
o Small (S)
o Medium Negative (MN)
o Large negative (LN)
3. Inference Engine
This component is a main component in any Fuzzy Logic system (FLS), because all the information is
processed in the Inference Engine. It allows users to find the matching degree between the current
fuzzy input and the rules. After the matching degree, this system determines which rule is to be added
according to the given input field. When all rules are fired, then they are combined for developing the
control actions.
4. Defuzzification
Defuzzification is a module or component, which takes the fuzzy set inputs generated by the Inference
Engine, and then transforms them into a crisp value. It is the last step in the process of a fuzzy logic
system. The crisp value is a type of value which is acceptable by the user. Various techniques are
present to do this, but the user has to select the best one for reducing the errors.
[Type here] [Type here] [Type here]
KNOWLEDGE INTERFERENCE
Membership Function
The membership function is a function which represents the graph of fuzzy sets, and allows users to
quantify the linguistic term. It is a graph which is used for mapping each element of x to the value
between 0 and 1.
This function is also known as indicator or characteristics function.
This function of Membership was introduced in the first papers of fuzzy set by Zadeh. For the Fuzzy
set B, the membership function for X is defined as: μB:X → [0,1]. In this function X, each element of set
B is mapped to the value between 0 and 1. This is called a degree of membership or membership
value.
Certainty factors
The Certainty Factor (CF) is a numeric value which tells us about how likely an event or a statement is
supposed to be true. It is somewhat similar to what we define in probability, but the difference in it is
that an agent after finding the probability of any event to occur cannot decide what to do. Based on the
probability and other knowledge that the agent has, this certainty factor is decided through which the
agent can decide whether to declare the statement true or false.
The value of the Certainty factor lies between -1.0 to +1.0, where the negative 1.0 value suggests that
the statement can never be true in any situation, and the positive 1.0 value defines that the statement
can never be false. The value of the Certainty factor after analyzing any situation will either be a
positive or a negative value lying between this range. The value 0 suggests that the agent has no
information about the event or the situation.
A minimum Certainty factor is decided for every case through which the agent decides whether the
statement is true or false. This minimum Certainty factor is also known as the threshold value. For
example, if the minimum certainty factor (threshold value) is 0.4, then if the value of CF is less than
this value, then the agent claims that particular statement false.
Uncertainty factors
When we talk about perceiving information from the environment, then the main problem that arises
is that there is always some uncertainty is our observations. This is because the world is an enormous
entity and the surroundings that we take under study is not always well defined. So, there needs to be
an estimation taken for getting to any conclusion.
Human being face this uncertainty daily that too many times. But still, they manage to take successful
decisions. This is because humans have strong estimating and decision making power and their brains
function in such a way that every time such a situation arises, the alternative with the maximum
positive output is chosen. But, the artificial agents are not able to take proper decisions while working
in such an environment. This is because, if the information available to them is not accurate, then they
cannot choose the right decision from their knowledge base.
Uncertainty Example
Taking a real life example, while buying vegetables, humans can easily distinguish between the
different kinds of vegetables by their color, sizes, textures, etc. But there is uncertainty in making the
right choices here, because the vegetables may not be exactly the same as described. Some of them
may be distorted, some may vary than the usual size and there can be many such variations. But in
spite of all of them, humans do not face any problem in situations like these.
But, this same thing becomes a hurdle when the decision is to be made by a computer based agent.
Because, we can feed the agent by the information that how the vegetables look, but still we cannot
accurately define the exact shape and size of each of them, because all of them have some variations.
So, as a solution to this, only the basic information is provided to the agent. Based on this very
information, the agent has to make certain estimates to find out which vegetable is kept in front of it.
Not only in this agent, but in designing almost every AI based agent, this strategy is followed. So, there
should be a proper method so that the agent can make certain estimations by itself without any help
or input from human beings.
Reasons for Uncertainty in Artificial Intelligence
The following are the reasons for uncertainty in Artificial Intelligence:
1. Partially observable environment
The entire environment is not always in reach of the agent. There are some parts of the environment
which are out of the reach of the agent and hence they are left unobserved. So, the decisions that the
agent makes do not include the information from these areas and hence, the result drawn may vary
form the actual case.
2. Dynamic Environment
As we all know that the environment is dynamic, i.e. there are always some changes that keep taking
place in the environment. So, the decision or calculations made at any instant may not be the same
after some time due to the changes that have occurred in the surroundings by that time. So, if the
observations made at any instance are considered later, then there can be an ambiguity in the decision
making.
3. Incomplete knowledge of the agent
If the agent has incomplete knowledge or insufficient knowledge about anything, then it cannot
produce correct results because the agent itself does not know about the situation and the way in
which the situation is to be handled.
4. Inaccessible areas in the environment
There are areas in the environment which are observable, but not in reach of the agent to access. In
such situations. The observation made is correct, but the as an agent cannot act on these parts of the
environment, these parts will remain unchanged by the actions of the agent. This will not affect the
current decision but can affect the estimations made by the agent in the future.
BAYESIAN NETWORK
Bayesian belief network is key computer technology for dealing with probabilistic events and to solve
a problem which has uncertainty. We can define a Bayesian network as:
"A Bayesian network is a probabilistic graphical model which represents a set of variables and their
conditional dependencies using a directed acyclic graph."
It is also called a Bayes network, belief network, decision network, or Bayesian model.
[Type here] [Type here] [Type here]
KNOWLEDGE INTERFERENCE
Bayesian networks are probabilistic, because these networks are built from a probability distribution,
and also use probability theory for prediction and anomaly detection.
Real world applications are probabilistic in nature, and to represent the relationship between multiple
events, we need a Bayesian network. It can also be used in various tasks including prediction, anomaly
detection, diagnostics, automated insight, reasoning, time series prediction, and decision making
under uncertainty.
Bayesian Network can be used for building models from data and experts opinions, and it consists of
two parts:
o Directed Acyclic Graph
o Table of conditional probabilities.
The generalized form of Bayesian network that represents and solve decision problems under
uncertain knowledge is known as an Influence diagram.
A Bayesian network graph is made up of nodes and Arcs (directed links), where:
o Each node corresponds to the random variables, and a variable can be continuous or discrete.
o Arc or directed arrows represent the causal relationship or conditional probabilities between
random variables. These directed links or arrows connect the pair of nodes in the graph.
These links represent that one node directly influence the other node, and if there is no
directed link that means that nodes are independent with each other
o In the above diagram, A, B, C, and D are random variables represented by the nodes of
the network graph.
o If we are considering node B, which is connected with node A by a directed arrow, then
node A is called the parent of Node B.
o Node C is independent of node A.
The Bayesian network has mainly two components:
o Causal Component
o Actual numbers
Each node in the Bayesian network has condition probability distribution P(Xi |Parent(Xi) ), which
determines the effect of the parent on that node.
Bayesian network is based on Joint probability distribution and conditional probability. So let's first
understand the joint probability distribution:
Joint probability distribution:
If we have variables x1, x2, x3,....., xn, then the probabilities of a different combination of x1, x2, x3.. xn,
are known as Joint probability distribution.
P[x1, x2, x3,....., xn], it can be written as the following way in terms of the joint probability distribution.
= P[x1| x2, x3,....., xn]P[x2, x3,....., xn]
= P[x1| x2, x3,....., xn]P[x2|x3,....., xn]....P[xn-1|xn]P[xn].
In general for each variable Xi, we can write the equation as:
P(Xi|Xi-1,........., X1) = P(Xi |Parents(Xi ))
Explanation of Bayesian network:
Let's understand the Bayesian network through an example by creating a directed acyclic graph:
Example: Harry installed a new burglar alarm at his home to detect burglary. The alarm reliably
responds at detecting a burglary but also responds for minor earthquakes. Harry has two neighbors
David and Sophia, who have taken a responsibility to inform Harry at work when they hear the alarm.
David always calls Harry when he hears the alarm, but sometimes he got confused with the phone
ringing and calls at that time too. On the other hand, Sophia likes to listen to high music, so sometimes
she misses to hear the alarm. Here we would like to compute the probability of Burglary Alarm.
Problem:
Calculate the probability that alarm has sounded, but there is neither a burglary, nor an earthquake
occurred, and David and Sophia both called the Harry.
Solution:
o The Bayesian network for the above problem is given below. The network structure is showing
that burglary and earthquake is the parent node of the alarm and directly affecting the
probability of alarm's going off, but David and Sophia's calls depend on alarm probability.
o The network is representing that our assumptions do not directly perceive the burglary and
also do not notice the minor earthquake, and they also not confer before calling.
o The conditional distributions for each node are given as conditional probabilities table or CPT.
o Each row in the CPT must be sum to 1 because all the entries in the table represent an
exhaustive set of cases for the variable.
o In CPT, a boolean variable with k boolean parents contains 2K probabilities. Hence, if there are
two parents, then CPT will contain 4 probability values
List of all events occurring in this network:
o Burglary (B)
o Earthquake(E)
o Alarm(A)
o David Calls(D)
o Sophia calls(S)
We can write the events of problem statement in the form of probability: P[D, S, A, B, E], can rewrite
the above probability statement using joint probability distribution:
o Let's take the observed probability for the Burglary and earthquake component:
o P(B= True) = 0.002, which is the probability of burglary.
o P(B= False)= 0.998, which is the probability of no burglary.
o P(E= True)= 0.001, which is the probability of a minor earthquake
o P(E= False)= 0.999, Which is the probability that an earthquake not occurred.
o From the formula of joint distribution, we can write the problem statement in the form of
probability distribution: