0% found this document useful (0 votes)
42 views36 pages

Unit-2 Inductive Classification (February 19, 2024)

The document discusses machine learning processes and designing learning systems. It explains that the machine learning process involves understanding goals, data preparation, model learning, result interpretation, and deployment. Key steps in designing a learning system are choosing training data, target functions, representation, and approximation algorithms. The final design results from experience with examples, failures, successes, and decisions.

Uploaded by

nikhilbadlani77
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views36 pages

Unit-2 Inductive Classification (February 19, 2024)

The document discusses machine learning processes and designing learning systems. It explains that the machine learning process involves understanding goals, data preparation, model learning, result interpretation, and deployment. Key steps in designing a learning system are choosing training data, target functions, representation, and approximation algorithms. The final design results from experience with examples, failures, successes, and decisions.

Uploaded by

nikhilbadlani77
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Unit- 2 Syllabus

Inductive Classification : The Concept learning task, Concept learning


as search through hypothesis space. General-to-specific ordering of
hypothesis. Finding maximally specific hypotheses. Version spaces and
the candidate elimination algorithm. Learning conjunctive concepts. The
importance of inductive bias.

1
Machine Learning in Practice
Machine learning algorithms are only a very small part of using machine
learning in practice as a data analyst or data scientist. In practice, the
process often looks like:
1. Start Loop
a) Understand the domain, prior knowledge and goals. Talk to
domain experts. Often the goals are very unclear. You often have
more things to try then you can possibly implement.
b) Data integration, selection, cleaning and pre-processing.
This is often the most time consuming part. It is important to have
high quality data. The more data you have, the more it sucks
because the data is dirty. Garbage in, garbage out.
c) Learning models. The fun part. This part is very mature. The
tools are general.
d) Interpreting results. Sometimes it does not matter how the
model works as long it delivers results. Other domains require
that the model is understandable. You will be challenged by
human experts.
e) Consolidating and deploying discovered knowledge. The
majority of projects that are successful in the lab are not used in
practice. It is very hard to get something used.
2. End Loop
It is not a one-shot process, it is a cycle. You need to run the loop until
you get a result that you can use in practice. Also, the data can change,
requiring a new loop.

There are 3 properties by which you could choose an algorithm:


Search procedure
• Direct computation: No search, just calculate what is needed.
• Local: Search though the hypothesis space to refine the hypothesis.
• Constructive: Build the hypothesis piece by piece.
2
Timing
• Eager: Learning performed up front. Most algorithms are eager.
• Lazy: Learning performed at the time that it is needed
Online vs Batch
• Online: Learning based on each pattern as it is observed.
• Batch: Learning over groups of patterns. Most algorithms are batch.

3
Designing a Learning System in Machine Learning :
According to Tom Mitchell, “A computer program is said to be learning
from experience (E), with respect to some task (T). Thus, the performance
measure (P) is the performance at task T, which is measured by P, and it
improves with experience E.”
Example: In Spam E-Mail detection,

• Task, T: To classify mails into Spam or Not Spam.


• Performance measure, P: Total percent of mails being correctly
classified as being “Spam” or “Not Spam”.
• Experience, E: Set of Mails with label “Spam”

Steps for Designing Learning System are:

Step 1) Choosing the Training Experience: The very important and first
task is to choose the training data or training experience which will be fed
to the Machine Learning Algorithm. It is important to note that the data
or experience that we fed to the algorithm must have a significant impact
4
on the Success or Failure of the Model. So Training data or experience
should be chosen wisely.
Below are the attributes which will impact on Success and
Failure of Data:
• The training experience will be able to provide direct or
indirect feedback regarding choices. For example: While Playing
chess the training data will provide feedback to itself like instead of
this move if this is chosen the chances of success increases.
• Second important attribute is the degree to which the learner
will control the sequences of training examples. For example:
when training data is fed to the machine then at that time accuracy
is very less but when it gains experience while playing again and
again with itself or opponent the machine algorithm will get
feedback and control the chess game accordingly.
• Third important attribute is how it will represent the
distribution of examples over which performance will be
measured. For example, a Machine learning algorithm will get
experience while going through a number of different cases and
different examples. Thus, Machine Learning Algorithm will get
more and more experience by passing through more and more
examples and hence its performance will increase.

Step 2- Choosing Target Function: The next important step is choosing


the target function. It means according to the knowledge fed to the
algorithm the machine learning will choose NextMove function which
will describe what type of legal moves should be taken. For example :
While playing chess with the opponent, when opponent will play then the
machine learning algorithm will decide what be the number of possible
legal moves taken in order to get success.
Step 3- Choosing Representation for Target Function: When the
machine algorithm will know all the possible legal moves the next step is
to choose the optimized move using any representation i.e. using linear
Equations, Hierarchical Graph Representation, Tabular form etc. The
NextMove function will move the Target move like out of these move
which will provide more success rate. For Example : while playing chess
5
machine have 4 possible moves, so the machine will choose that
optimized move which will provide success to it.
Step 4- Choosing Function Approximation Algorithm: An optimized
move cannot be chosen just with the training data. The training data had
to go through with set of example and through these examples the training
data will approximates which steps are chosen and after that machine will
provide feedback on it. For Example : When a training data of Playing
chess is fed to algorithm so at that time it is not machine algorithm will
fail or get success and again from that failure or success it will measure
while next move what step should be chosen and what is its success rate.
Step 5- Final Design: The final design is created at last when system goes
from number of examples , failures and success , correct and incorrect
decision and what will be the next step etc. Example: DeepBlue is an
intelligent computer which is ML-based won chess game against the
chess expert Garry Kasparov, and it became the first computer which had
beaten a human chess expert.

Introduction of Hypothesis
MEANING OF HYPOTHESIS
The word hypothesis is made up of two Greek roots which mean that it is
some sort of ‘sub-statements’, for it is the presumptive statement of a
proposition, which the investigation seeks to prove.
The word hypothesis consists of two words:
Hypo + thesis = Hypothesis
‘Hypo’ means tentative or subject to the verification and ‘Thesis’ means
statement about solution of a problem.
The word meaning of the term hypothesis is a tentative statement about
the solution of the problem. Hypothesis offers a solution of the problem
that is to be verified empirically and based on some rationale.
Another meaning of the word hypothesis which is composed of two
words:‘Hypo’ means composition of two or more variables which is to
be verified. ‘Thesis’ means position of these variables in the specific
frame of reference.

6
This is the operational meaning of the term hypothesis. Hypothesis is the
composition of some variables which have some specific position or role
of the variables i.e. to be verified empirically. It is a proposition about the
factual and conceptual’ elements. Hypothesis is called a leap into the
dark. It is a brilliant guess about the solution of a problem.
A tentative generalization or theory formulated about the character of
phenomena under observation are called hypothesis. It is a statement
temporarily accepted as true in the light of what is known at the time
about the phenomena. It is the basis for planning and action- in the
research for new truth.

DEFINITIONS OF HYPOTHESIS
The term hypothesis has been defined in several ways. Some important
definitions have been given in the following paragraphs:
• Hypothesis -A tentative supposition or provisional guess “It is a
tentative supposition or provisional guess which seems to
explain the situation under observation.” – James E. Greighton
• Hypothesis- A Tentative generalization.
A Lungberg thinks “A hypothesis is a tentative generalization the
validity of which remains to be tested. In its most elementary
stage the hypothesis may be any hunch, guess, imaginative idea
which becomes the basis for further investigation.”
• Hypothesis - A proposition is to be put to test to determine its
validity:
Goode and Han, “A hypothesis states what we are looking for. A
hypothesis looks forward. It is a proposition which can be put to
a test to determine its validity. It may prove to be correct or
incorrect.
• Hypothesis- An expectation about events based on
generalization:
Bruce W. Tuckman, “A hypothesis then could be defined as an
expectation about events based on generalization of the assumed
relationship between variables.”

7
• Hypothesis -A tentative statement of the relationship between
two or more variables:
“A hypothesis is a tentative statement of the relationship between
two or more variables. Hypotheses are always in declarative
sentence form and they relate, either generally or specifically
variable and variables.”
• Hypothesis - A theory when it is stated as a testable proposition.
M. Verma, “A theory when stated as a testable proposition
formally and clearly and subjected to empirical or experimental
verification is known as a hypothesis.”
• Hypothesis -A testable proposition or assumption.
George, J. Mouly defines that, “Hypothesis is an assumption or
proposition whose testability is to be tested on the basis of the
computability of its implications with empirical evidence with
previous knowledge.”
• Hypothesis - Tentative relationship of two or more variables
either normative or casual:

“A hypothesis is defined as a statement for the tentative relationship of


two or more variables. The relationship of the variables may either be
normative or causal relationship. It should be based on some rationale.”

What are Dependent and Independent Variables?


Dependent Variable: By the name itself we can clearly understand
that this variable will vary depending on other variables or other
factors.
Dependent variable is also called as Response variable (Outcome).
Example:
Consider the students score in the examination which could vary
based on several factors.

8
Independent Variable: This is the variable which is not influenced
by other variable rather we can say this variable standalone which can
have a quality of influencing others.
Independent variable is also called as Predictor variable (Input).
Example:
Consider the same example as student score in the examination.
Generally, student score will depend on various factors like hours of
study, attendance etc. So, the time spent by the student to prepare for
examination can be considered as independent variable.

Introduction of “Concept Learning” Task


Concept Learning is a way to find all the consistent hypotheses or
concepts. The designing of learning system is already mentioned
above.
Why Concept Learning?
A lot of our learning revolves around grouping or categorizing a large
data set. Each concept of learning can be viewed as describing some
subset of objects or events defined over a larger set. For example, a
subset of vehicles that constitute cars.
Alternatively, each dataset has certain attributes. For example, if you
consider a car, its attributes will be color, size, number of seats, etc.
And these attributes can be defined as Binary valued attributes.
Let’s take another elaborate example of EnjoySport, The attribute
EnjoySport shows if a person is participating in his favorite water
activity on this particular day.
The goal is to learn to anticipate the value of EnjoySport on any given
day based on its other qualities’ values.

To simplify,
Task T: Determine the value of EnjoySport for every given day based
on the values of the day’s qualities.

9
The total proportion of days (EnjoySport) accurately anticipated is
the performance metric P.
Experience E: A collection of days with pre-determined labels
(EnjoySport: Yes/No).

Each hypothesis can be considered as a set of six constraints, with the


values of the six attributes Sky, AirTemp, Humidity, Wind, Water,
and Forecast specified.

Sky Air temp Humidity Wind Water Forecast EnjoySport

Sunny Warm Normal Strong Warm Same Yes

Sunny Warm High Strong Warm Same Yes

Rainy Cold High Strong Warm Change No

Sunny Warm High Strong Cool Change Yes

Here the concept = < Sky, Air Temp, Humidity, Wind, Forecast>.

The number of possible instances = 2^d.


The total number of Concepts = 2^(2^d).

Where d is the number of features or attributes. In this case, d = 5


=> The number of possible instances = 2^5 = 32.
=> The total number of Concepts = 2^(2^5) = 2^(32).

From these 2^(32) concepts we got, Your machine doesn’t have to


learn about all of these topics. You will select a few of the concepts
from 2^(32) concepts to teach the machine.

The concepts chosen need to be consistent all the time. This


hypothesis is called target concept (or) hypothesis space.
10
Hypothesis Space:
To formally define Hypothesis space, The collection of all feasible
legal hypotheses is known as hypothesis space. This is the set from
which the machine learning algorithm will select the best (and only)
function or outputs that describe the target function.
The hypothesis will either
• Indicate with a “?” that any value is acceptable for this attribute.
• Define a specific necessary value (e.g., Warm).
• Indicate with a “0” that no value is acceptable for this attribute.
• The expression that represents the hypothesis that the person loves
their favorite sport exclusively on chilly days with high humidity
(regardless of the values of the other criteria) is –
< ?, Cold, High, ?, ? >
• The most general hypothesis that each day is a positive example is
represented by
<?, ?, ?, ?, ?, ?>
• The most specific possible hypothesis that none of the day is a
positive example is represented by
<0, 0, 0, 0, 0, 0>
There are 3 concerns for a choosing a hypothesis space space:
• Size: number of hypotheses to choose from
• Randomness: stochastic or deterministic
• Parameter: the number and type of parameters

Concept Learning as Search:


The main goal is to find the hypothesis that best fits the training data
set.
Consider the examples X and hypotheses H in the EnjoySport
learning task, for example.

11
With three potential values for the property Sky and two for AirTemp,
Humidity, Wind, Water, and Forecast, the instance space X contains
precisely,

=> The number of different instances possible = 3*2*2*2*2*2 = 96.

General-To-Specific Ordering of Hypothesis


The theories can be sorted from the most specific to the most general.
This will allow the machine learning algorithm to thoroughly
investigate the hypothesis space without having to enumerate each
and every hypothesis in it, which is impossible when the hypothesis
space is infinitely vast.
Now talk about General-To-Specific ordering and how to utilize it to
construct a feeling of order in a hypothesis space in any concept
learning issue.
Let us have a look at our previous EnjoySport example again,

Task T: Determine the value of EnjoySport for every given day based
on the values of the day’s qualities.

The total proportion of days (EnjoySport) accurately anticipated is


the performance metric P.

Experience E: A collection of days with pre-determined labels


(EnjoySport: Yes/No).

Each hypothesis can be considered as a set of six constraints, with the


values of the six attributes Sky, AirTemp, Humidity, Wind, Water,
and Forecast specified.

12
Sky Air temp Humidity Wind Water Forecast EnjoySport

Sunny Warm Normal Strong Warm Same Yes

Sunny Warm High Strong Warm Same Yes

Rainy Cold High Strong Warm Change No

Sunny Warm High Strong Cool Change Yes

Take a look at the following two hypotheses:


h1 = <Rainy, Warm, Strong>
h2 = <Rainy, ?, Strong>

The question is how many and which examples are classed as positive
by each of these theories (i.e., satisfy these hypotheses). Only
example 4 is satisfactory for h1, however, both examples 3 and 4 are
satisfactory and categorized as positive for h2.

What is the reason behind this? What makes these two hypotheses so
different? The solution is found in the rigor with which each of these
theories imposes limits. As you can see, h1 places more restrictions
on you than h2! Naturally, h2 can categorize more good cases than
h1! In this case, we may really assert the following:

“If an example meets h1, it will almost certainly meet h2, but not the
other way around.”

This is due to the fact that h2 is more general than h1. This may be
seen in the following example: h2 has a wider range of choices than
h1. If an instance has the following values:< Rainy, Freezing,
Strong>, h2 will classify it as positive, but h1 will not be fulfilled.
13
However, if h1 identifies an occurrence as positive, such as <Rainy,
Warm, Strong>, h2 will almost certainly categorise it as positive as
well.

In fact, each case that is categorised as positive by h1 is likewise


classed as positive by h2. As a result, we might conclude that h2 is
more generic than h1.

We state that x fulfils h if and only if h(x) = 1 for each instance x in


X and hypothesis h in H.

Definition:
Let hj and hk be boolean-valued functions that are defined over X. If
and only if, hj is more general than or equal to hk (written hj >=g hk).

We can show this relationship with the following notation:


hj ≥g hk
The letter g stands for “general.” There are times when one theory is
more general than the other, but it is not the same.

Because every case that fulfils hl also satisfies h2, hypothesis h2 is


more general than hl.
In the same way, h2 is a more broad term than h3.
It’s worth noting that neither hl nor h3 are more general than the
other; while the instances met by both hypotheses overlap, neither set
subsumes the other.

14
A handful of the key algorithms that may be used to explore the
hypothesis space, H, by making use of the g operation. Finding-S is
the name of the method, with S standing for specific and implying
that the purpose is to identify the most particular hypothesis.

We can observe that all the occurrences that fulfill both h1 and h3
also satisfy h2, thus we can conclude that:
h2 ≥g. h1 and h3 are two different types of h2 g h1 and h3.

Finding a Maximally Specific Hypothesis: Find-S

The find-S algorithm is a machine learning concept learning


algorithm. The find-S technique identifies the hypothesis that best
matches all of the positive cases.

Discuss here, the algorithm and some examples of Find-S: an


algorithm to find a maximally specific hypothesis.
15
To understand it from scratch let’s have a look at all the terminologies
involved,

Hypothesis:
It is usually represented with an ‘h’. In supervised machine learning,
a hypothesis is a function that best characterizes the target.

For example, Consider a coordinate plane showing the output as


positive or negative for a given task.

The Hypothesis Space is made up of all of the legal ways in which we


might partition the coordinate plane to anticipate the outcome of the
test data.
Each conceivable path, represented with a gray line is referred to as a
hypothesis.
Specific Hypothesis:
If a hypothesis, h, covers none of the negative cases and there is no
other hypothesis, h′, that covers none of the negative examples, then
h is strictly more general than h′, then h is said to be the most specific
hypothesis.
16
The specific hypothesis fills in important details about the variables
given in the hypothesis.

Find-S:
The find-S algorithm is a machine learning concept learning
algorithm. The find-S technique identifies the hypothesis that best
matches all of the positive cases. The find-S algorithm considers only
positive cases.

When the find-S method fails to categorize observed positive training


data, it starts with the most particular hypothesis and generalizes it.

Representations:
• The most specific hypothesis is represented using ϕ.
• The most general hypothesis is represented using ?.

? basically means that any value is accepted for the attribute.

Whereas, ϕ means no value is accepted for the attribute.

Let’s have a look at the algorithm of Find-S:


1. Initialize the value of the hypothesis for all attributes with the most
specific one. That is,
h0 = < ϕ, ϕ, ϕ, ϕ…….. >
2. Take the next example, if the taken example is negative leave them
and move on to another example without changing our hypothesis
for the step.

3. Now, if the taken example is a positive example, then


17
For each attribute, check if the value of the attribute is equal to that
of the value we took in our hypothesis.

If the value is equal then we’ll use the same value for the attribute in
our hypothesis and move to another attribute. If the value of the
attribute is not equal to that of the value in our specific hypothesis
then change the value of our attribute in a specific hypothesis to the
most general hypothesis (?).

After we’ve completed all of the training examples, we’ll have a final
hypothesis that we can use to categorize the new ones.

Let’s have a look at an example to see how Find-S works.

Consider the following data set, which contains information about the
best day for a person to enjoy their preferred sport.

Sky Air temp Humidity Wind Water Forecast EnjoySport

Sunny Warm Normal Strong Warm Same Yes

Sunny Warm High Strong Warm Same Yes

Rainy Cold High Strong Warm Change No

Sunny Warm High Strong Cool Change Yes

Now initializing the value of the hypothesis for all attributes with the
most specific one.
h0 = < ϕ, ϕ, ϕ, ϕ, ϕ, ϕ>

18
Consider example 1, The attribute values are < Sunny, Warm,
Normal, Strong, Warm, Same>. Since its target class(EnjoySport)
value is yes, it is considered as a positive example.

Now, We can see that our first hypothesis is more specific, and we
must generalize it in this case. As a result, the hypothesis is:
h1 = < Sunny, Warm, Normal, Strong, Warm, Same>

The second training example (also positive in this case) compels the
algorithm to generalize h further, this time by replacing any attribute
value in h that is not met by the new example with a “?”.

The attribute values are < Sunny, Warm, High, Strong, Warm, Same>

The refined hypothesis now is,


h2 = < Sunny, Warm, ?, Strong, Warm, Same >

Consider example 3, The attribute values are < Rainy, Cold, High,
Strong, Warm, Change>. But since the target class value is No, it is
considered as a negative example.
h3 = < Sunny, Warm, ?, Strong, Warm, Same > (Same as that of h2)

Every negative example is simply ignored by the FIND-S algorithm.


As a result, no changes to h will be necessary in reaction to any
unfavorable case.

The fourth (positive) case leads to a further generalization of h in our


Find-S trace.

Consider example 4, It has the following information <Sunny, Warm,


High, Strong, Cool, Change> which again is a positive example.
19
Every attribute is compared to the initial data, and if there is a
discrepancy, the attribute is replaced with a general case (“? “). After
completing the procedure, the following hypothesis emerges:
h4 = < Sunny, Warm, ?, Strong, ?, ? >

Therefore the final hypothesis is h = < Sunny, Warm, ?, Strong, ?, ? >.

The hypothesis is only expanded as far as is necessary to encompass


the new positive case at each phase. As a result, the hypothesis at each
step is the most particular hypothesis consistent with the training
instances seen thus far (hence the name FIND-S).

FIND-S will always return the most specific hypothesis inside H that
matches the positive training instances.

What is Version Space?


It’s a cross between a generic and a specific theory. It didn’t simply
write one hypothesis; it wrote a list of all feasible hypotheses based
on the training data.

With regard to hypothesis space H and training examples D, the


version space, denoted as VSH,D, is the subset of hypotheses from H
that are consistent with the training instances in D.

For example, consider the following dataset. The classic example of


EnjoySport.

20
In the above example, We have two hypotheses from H in the case
above, both of which are consistent with the training dataset.

h1=< Sunny, Warm, ?, Strong, ?, ?> and


h2=< ?, Warm, ?, Strong, ?, ?>

As a result, the collection of hypotheses h1, h2 is referred to as a


Version Space.

Candidate Elimination Learning Algorithm


Candidate Elimination Learning Algorithm is a method for learning
concepts from data that is supervised. Here explain the candidate
elimination learning algorithm with examples.

Given a hypothesis space H and a collection E of instances, the


candidate elimination procedure develops the version space
progressively.

The examples are introduced one by one, with each one potentially
shrinking the version space by deleting assumptions that contradict
the example. For each new case, the candidate elimination method
updates the general and particular boundaries.

21
To understand the algorithm better, let us have a look at some
terminologies and what it means.

Specific Hypothesis:
If a hypothesis, h, covers none of the negative cases and there is no
other hypothesis, h′, that covers none of the negative examples, then
h is strictly more general than h′, then h is said to be the most specific
hypothesis.

The specific hypothesis fills in important details about all the


variables given in the hypothesis.
S = < ‘ϕ’, ‘ϕ’, ‘ϕ’, ……, ‘ϕ’ >

General Hypothesis:
In general, a hypothesis is an explanation for anything. The general
hypothesis explains the relationship between the key variables in
general. I want to watch Avengers, for example, is a general
hypothesis for selecting a movie.
G = < ‘?’, ‘?’, ‘?’, …..’?’>

Representations:
• The most specific hypothesis is represented using ϕ.
• The most general hypothesis is represented using ?.

Why Candidate Elimination Algorithm?


Candidate Elimination Learning Algorithm addresses several of the
limitations of FIND-S.

22
Although the FIND-S algorithm outputs a hypothesis from H, that is
consistent with the training examples, this is just one of many
hypotheses from H that might fit the training data equally well.

The key idea in the CANDIDATE-ELIMINATlON Algo is to output


a description of the set of all hypotheses consistent with the training
examples.

Candidate Elimination:
Unlike Find-S algorithm, the Candidate Elimination algorithm
considers not just positive but negative samples as well. It relies on
the concept of version space.
At the end of the algorithm, we get both specific and general
hypotheses as our final solution.
For a positive example, we move from the most specific hypothesis
to the most general hypothesis.
For a negative example, we move from the most general hypothesis
to the most specific hypothesis.

Candidate Elimination Algorithm:


1. Initialize both specific and general hypotheses.
S = < ‘ϕ’, ‘ϕ’, ‘ϕ’, ….., ‘ϕ’ >
G = < ‘?’, ‘?’, ‘?’, ….., ’?’>
Depending on the number of attributes.

2. Take the next example, if the taken example is positive make a


specific hypothesis to general.

3. If the taken example is negative make the general hypothesis to a


more specific hypothesis.

23
Let’s have a look at an example to see how the Candidate Elimination
Algorithm works.

1. Initializing both specific and general hypotheses.


G0 = < <?, ?, ?, ?, ?, ?> , <?, ?, ?, ?, ?, ?> , <?, ?, ?, ?, ?, ?>,
<?, ?, ?, ?, ?, ?> , <?, ?, ?, ?, ?, ?> , <?, ?, ?, ?, ?, ? >>
S0 = < ϕ, ϕ, ϕ, ϕ, ϕ, ϕ>

2. When the first training example is supplied (in this case, a positive
example), the Candidate Elimination method evaluates the S boundary
and determines that it is too specific, failing to cover the positive example.

As a result, the border is shifted to the least general hypothesis that


covers this new case. S1 indicates the updated border.

No update for G1 is needed in this example as G0 accurately


represents the training instance.
G1 = G0 = < <?, ?, ?, ?, ?, ?> , <?, ?, ?, ?, ?, ?> , <?, ?, ?, ?, ?, ?>,
<?, ?, ?, ?, ?, ?> , <?, ?, ?, ?, ?, ?> , <?, ?, ?, ?, ?, ? >>
S1 = < ‘Sunny’, ‘warm’, ‘normal’, ‘strong’, ‘warm ‘, ‘same’>

3. When the second (also positive) training example is observed, it has a


similar effect of generalizing S to S2, while leaving G intact (i.e., G2 =
G1 = G0).
G2 = G0 = < <?, ?, ?, ?, ?, ?> , <?, ?, ?, ?, ?, ?> , <?, ?, ?, ?, ?, ?>,
<?, ?, ?, ?, ?, ?> , <?, ?, ?, ?, ?, ?> , <?, ?, ?, ?, ?, ? >>
S2 = < ‘Sunny’, ‘warm’, ‘?’, ‘strong’, ‘warm ‘, ‘same’>

4. Similarly, considering the training instance 3, This negative example


demonstrates that the version space’s G border is extremely general;

24
As a result, the hypothesis in the G border must be specialized until
it appropriately categorizes this new negative case.
G3 = < <‘Sunny’, ?, ?, ?, ?, ?>, <?, ‘warm’, ?, ?, ?, ?>, <?, ?, ?, ?,
?, ?>, <?, ?, ?, ?, ?, ?>, <?, ?, ?, ?, ?, ?>, <?, ?, ?, ?, ?, ‘same’> >
S3 = S2 = < ‘Sunny’, ‘warm’, ‘?’, ‘strong’, ‘warm ‘, ‘same’>

5. The fourth training example, generalizes the version space’s S boundary.


It also results in the removal of one G border member, as this one fails to
cover the new positive example.
G4 = < <‘Sunny’, ?, ?, ?, ?, ?>, <?, ‘warm’, ?, ?, ?, ?> >
S4 = <‘Sunny’, ‘warm’, ?, ‘strong’, ?, ?>

Finally, the result is produced by synchronizing the G4 and S4


algorithms.

The above diagram depicts the whole version space, including the
hypotheses bounded by S4 and G4. The order in which the training
examples are given has no impact on the learned version space.
The final hypothesis is,
G = <[‘Sunny’, ?, ?, ?, ?, ?>, <?, ‘warm’, ?, ?, ?, ?>>
S = <‘Sunny’, ‘warm’, ?, ‘strong’, ?, ?>
25
Inductive Learning:
There are different ways learning can be accomplished in a computer
program. Inductive learning is one of the method.
Inductive learning: In this method computer is feeded with the
labelled data (solved examples). A learning algorithm is used go
through the given data set. This learning algorithm with the help of
these solved examples (labelled data) produces a machine learning
model. This model in turn can take unlabelled data and should be able
to produce labelled data with significant accuracy. Figure shown
below shows a schematic of inductive learning.

We are given input samples (x) and output samples (f(x)) in the
context of inductive learning, and the objective is to estimate the
function (f). The goal is to generalize from the samples and map such
that the output may be estimated for fresh samples in the future.

From the perspective of inductive learning, we are given input samples


(x) and output samples (f(x)) and the problem is to estimate the function
(f). Specifically, the problem is to generalize from the samples and the
mapping to be useful to estimate the output for new samples in the future.
26
Inductive Learning is where we are given examples of a function in the
form of data (x) and the output of the function (f(x)). The goal of inductive
learning is to learn the function for new data (x).

In practice, estimating the function is nearly always too difficult, thus


we seek extremely excellent estimates of the function.

Classification of machine learning based on what a model is


trying to predict:
Regression :

For a given dataset a regression model gives out a real number. For
example a model which gives temperature as its result.

Binary Classification

For a given dataset a binary classification model gives out one of the two
labels. For example a model can take a image as input and assign one of
the two labels such as 'Cat' and 'Not cat'. In a binary model a negation of
one label must imply that given dataset is from another label. True/false
is another goof example but 'cat' and 'not dog' is not a good binary

27
classification.

Multiclass classification

If there are more than two levels that the model can associate a given
example with, this would be multiclass classification. Examples can be a
model to predict weather or to find color.

Ranking

Given a preference rule, the algorithm should give a sorted order. Such
examples include movie recommendation, google searches, names on
facebook. Basically the model will take a unsorted dataset and sort it
based on the rules of preference. Now buying recommendations on
28
Amazon can also be made using statistics, like how many people bought
one item after the other. The point here is, in case of using statistics
someone will have to make a filter which analyzes all the data and gives
out a preference, whereas in case of machine learing its the machine which
does this job for us.

Some practical examples of induction are:

Assessment of credit risk:


The x represents the customer’s properties.
Whether or whether the f(x) has been accepted for credit.

The diagnosis of disease:


The x represents the patient’s characteristics.
The f(x) is the illness they are afflicted with.

Face recognition: is a technique for recognizing someone’s face.


Bitmaps of people’s faces make up the x.
The f(x) is used to give the face a name.
• Automatic steering.
The x are bitmap images from a camera in front of the car.
The f(x) is the degree the steering wheel should be turned.

When Should You Use Inductive Learning?


There are problems where inductive learning is not a good idea. It is
important when to use and when not to use supervised machine learning.
4 problems where inductive learning might be a good idea:

29
• Problems where there is no human expert. If people do not know the
answer they cannot write a program to solve it. These are areas of true
discovery.
• Humans can perform the task but no one can describe how to do it.
There are problems where humans can do things that computer cannot do
or do well. Examples include riding a bike or driving a car.
• Problems where the desired function changes frequently. Humans
could describe it and they could write a program to do it, but the problem
changes too often. It is not cost effective. Examples include the stock
market.
• Problems where each user needs a custom function. It is not cost
effective to write a custom program for each user. Example is
recommendations of movies or books on Netflix or Amazon.

The Essence of Inductive Learning


We can write a program that works perfectly for the data that we have.
This function will be maximally overfit. But we have no idea how well it
will work on new data, it will likely be very badly because we may never
see the same examples again.
The data is not enough. You can predict anything you like. And this would
be naive assume nothing about the problem.
In practice we are not naive. There is an underlying problem and we
are interested in an accurate approximation of the function. There is a
double exponential number of possible classifiers in the number of input
states. Finding a good approximate for the function is very difficult.
There are classes of hypotheses that we can try. That is the form that the
solution may take or the representation. We cannot know which is most
suitable for our problem before hand. We have to use experimentation to
discover what works on the problem.

30
Two perspectives on inductive learning:
• Learning is the removal of uncertainty. Having data removes some
uncertainty. Selecting a class of hypotheses we are removing
more uncertainty.
• Learning is guessing a good and small hypothesis class. It requires
guessing. We don’t know the solution we must use a trial and error
process. If you knew the domain with certainty, you don’t need learning.
But we are not guessing in the dark.
You could be wrong.
• Our prior knowledge could be wrong.
• Our guess of the hypothesis class could be wrong.
In practice we start with a small hypothesis class and slowly grow the
hypothesis class until we get a good result.

A Framework For Studying Inductive Learning


Terminology used in machine learning:
• Training example: a sample from x including its output from the target
function
• Target function: the mapping function f from x to f(x)
• Hypothesis: approximation of f, a candidate function.
• Concept: A boolean target function, positive examples and negative
examples for the 1/0 class values.
• Classifier: Learning program outputs a classifier that can be used to
classify.
• Learner: Process that creates the classifier.
• Hypothesis space: set of possible approximations of f that the algorithm
can create.
• Version space: subset of the hypothesis space that is consistent with the
observed data.

31
Some of the fundamental questions for inductive reference are,
• What happens if the target concept isn’t in the hypothesis space?
• Is it possible to avoid this problem by adopting a hypothesis space
that contains all potential hypotheses?
• What effect does the size of the hypothesis space have on the
algorithm’s capacity to generalize to unseen instances?
• What effect does the size of the hypothesis space have on the
number of training instances required?

Inductive Bias in Machine Learning


Introduction
The phrase “inductive bias” refers to a collection of (explicit or
implicit) assumptions made by a learning algorithm in order to
conduct induction, or generalize a limited set of observations
(training data) into a general model of the domain.

The bias in the answer based on algorithm used to make the model is
called inductive bias. For example lets say we provide few people with
pictures of birds and animals and then ask them to label them into two
groups. Now some people can ask whether the oraganism can fly or not
and then label based on that, someone can also ask the question whether
the organism is mammal or not and can end up with different labelling.
This kind of discrepancy in result is a case of inductive bias.

What is Inductive Bias?


Inductive bias refers to a set of assumptions made by a learning
algorithm in order to conduct induction or generalize a limited set of
observations (training data) into a general model of the domain.

Induction would be impossible without such a bias, because


observations may generally be extended in a variety of ways.
32
Predictions for new scenarios could not be formed if all of these
options were treated equally, that is, without any bias in the sense of
a preference for certain forms of generalization (representing
previous information about the target function to be learned).

The idea of inductive bias is to let the learner generalize beyond the
observed training examples to deduce new examples.
‘ > ’ -> Inductively inferred from.

For example,
x > y means y is inductively deduced from x.

Types of Inductive Bias:


• Maximum conditional independence: It aims to maximize
conditional independence if the hypothesis can be put in a Bayesian
framework. The Naive Bayes classifier employs this bias.
• Minimum cross-validation error: Select the hypothesis with the
lowest cross-validation error when deciding between hypotheses.
Despite the fact that cross-validation appears to be bias-free, the “no
free lunch” theorems prove that cross-validation is biased.
• Maximum margin: While creating a border between two classes,
try to make the boundary as wide as possible. In support vector
machines, this is the bias. The idea is that distinct classes are usually
separated by large gaps.
• Minimum hypothesis description length: When constructing a
hypothesis, try to keep the description as short as possible. Simpler
theories are seen to be more likely to be correct. Occam’s razor does
not suggest this. Simpler models are easier to test, not necessarily
“more likely to be true.” See the principle of Occam’s Razor.
• Minimum features: features should be removed unless there is
strong evidence that they are helpful. Feature selection methods are
based on this premise.
33
• Nearest neighbors: Assume that the majority of the examples in a
local neighborhood in feature space are from the same class.

If the class of a case is unknown, assume that it belongs to the same


class as the majority of the people in its near vicinity. The k-nearest
neighbor’s algorithm employs this bias. Cases that are close to each
other are assumed to belong to the same class.

Key issues in machine learning:


• What are good hypothesis space?
• What algorithms work with that space?
• What can I do to optimize accuracy on unseen data?
• How do we have confidence in the model?
• Are there learning problems that are computationally intractable?
• How can we formulate application problems as machine learning
problems?

Learning Conjunctive Concepts


Humans are able to distinguish between different “things,” e.g., chair,
table, sofa, book, newspaper, car, airplane, a.s.o. Also, there is no doubt
that humans have to learn how to distinguish “things.” Therefore, we ask
whether this particular learning problem allows an algorithmic solution,
too. That is, we specify the learner to be an algorithm. Furthermore, we
specify the learning domain to be the set of all things.

However, since we aim to model learning, we have to convert “real


things” into mathematical descriptions of things. This can be done as
follows.

We fix some language to express a finite list of properties.

34
Afterwards, we decide which of these properties are relevant for the
particular things we want to deal with, and which of them have to be
fulfilled or not to be fulfilled, respectively.

For example, the list of properties may be fixed as follows:

- possesses 4 legs, - possesses a rest, - has brown color, - possesses 4 wheels,


- it needs fuel, - possesses a seat,- possesses wings, ..., - has more than 100 pages.

Now, we can answer

“What is a chair?”
by deciding which of the properties are relevant for the concept “chair,”
and which of them have to be fulfilled or not to be fulfilled, respectively.

Hence, we obtain:

[1] possesses 4 legs - yes


[2] possesses a rest - yes
[3] has brown color - irrelevant
[4] possesses 4 wheels - no
[5] it needs fuel - no
[6] possesses a seat - yes
[7] possesses wings - no
.
.
.
[n.] has more than 100 pages - no

35
36

You might also like