Unit-2 Inductive Classification (February 19, 2024)
Unit-2 Inductive Classification (February 19, 2024)
1
Machine Learning in Practice
Machine learning algorithms are only a very small part of using machine
learning in practice as a data analyst or data scientist. In practice, the
process often looks like:
1. Start Loop
a) Understand the domain, prior knowledge and goals. Talk to
domain experts. Often the goals are very unclear. You often have
more things to try then you can possibly implement.
b) Data integration, selection, cleaning and pre-processing.
This is often the most time consuming part. It is important to have
high quality data. The more data you have, the more it sucks
because the data is dirty. Garbage in, garbage out.
c) Learning models. The fun part. This part is very mature. The
tools are general.
d) Interpreting results. Sometimes it does not matter how the
model works as long it delivers results. Other domains require
that the model is understandable. You will be challenged by
human experts.
e) Consolidating and deploying discovered knowledge. The
majority of projects that are successful in the lab are not used in
practice. It is very hard to get something used.
2. End Loop
It is not a one-shot process, it is a cycle. You need to run the loop until
you get a result that you can use in practice. Also, the data can change,
requiring a new loop.
3
Designing a Learning System in Machine Learning :
According to Tom Mitchell, “A computer program is said to be learning
from experience (E), with respect to some task (T). Thus, the performance
measure (P) is the performance at task T, which is measured by P, and it
improves with experience E.”
Example: In Spam E-Mail detection,
Step 1) Choosing the Training Experience: The very important and first
task is to choose the training data or training experience which will be fed
to the Machine Learning Algorithm. It is important to note that the data
or experience that we fed to the algorithm must have a significant impact
4
on the Success or Failure of the Model. So Training data or experience
should be chosen wisely.
Below are the attributes which will impact on Success and
Failure of Data:
• The training experience will be able to provide direct or
indirect feedback regarding choices. For example: While Playing
chess the training data will provide feedback to itself like instead of
this move if this is chosen the chances of success increases.
• Second important attribute is the degree to which the learner
will control the sequences of training examples. For example:
when training data is fed to the machine then at that time accuracy
is very less but when it gains experience while playing again and
again with itself or opponent the machine algorithm will get
feedback and control the chess game accordingly.
• Third important attribute is how it will represent the
distribution of examples over which performance will be
measured. For example, a Machine learning algorithm will get
experience while going through a number of different cases and
different examples. Thus, Machine Learning Algorithm will get
more and more experience by passing through more and more
examples and hence its performance will increase.
Introduction of Hypothesis
MEANING OF HYPOTHESIS
The word hypothesis is made up of two Greek roots which mean that it is
some sort of ‘sub-statements’, for it is the presumptive statement of a
proposition, which the investigation seeks to prove.
The word hypothesis consists of two words:
Hypo + thesis = Hypothesis
‘Hypo’ means tentative or subject to the verification and ‘Thesis’ means
statement about solution of a problem.
The word meaning of the term hypothesis is a tentative statement about
the solution of the problem. Hypothesis offers a solution of the problem
that is to be verified empirically and based on some rationale.
Another meaning of the word hypothesis which is composed of two
words:‘Hypo’ means composition of two or more variables which is to
be verified. ‘Thesis’ means position of these variables in the specific
frame of reference.
6
This is the operational meaning of the term hypothesis. Hypothesis is the
composition of some variables which have some specific position or role
of the variables i.e. to be verified empirically. It is a proposition about the
factual and conceptual’ elements. Hypothesis is called a leap into the
dark. It is a brilliant guess about the solution of a problem.
A tentative generalization or theory formulated about the character of
phenomena under observation are called hypothesis. It is a statement
temporarily accepted as true in the light of what is known at the time
about the phenomena. It is the basis for planning and action- in the
research for new truth.
DEFINITIONS OF HYPOTHESIS
The term hypothesis has been defined in several ways. Some important
definitions have been given in the following paragraphs:
• Hypothesis -A tentative supposition or provisional guess “It is a
tentative supposition or provisional guess which seems to
explain the situation under observation.” – James E. Greighton
• Hypothesis- A Tentative generalization.
A Lungberg thinks “A hypothesis is a tentative generalization the
validity of which remains to be tested. In its most elementary
stage the hypothesis may be any hunch, guess, imaginative idea
which becomes the basis for further investigation.”
• Hypothesis - A proposition is to be put to test to determine its
validity:
Goode and Han, “A hypothesis states what we are looking for. A
hypothesis looks forward. It is a proposition which can be put to
a test to determine its validity. It may prove to be correct or
incorrect.
• Hypothesis- An expectation about events based on
generalization:
Bruce W. Tuckman, “A hypothesis then could be defined as an
expectation about events based on generalization of the assumed
relationship between variables.”
7
• Hypothesis -A tentative statement of the relationship between
two or more variables:
“A hypothesis is a tentative statement of the relationship between
two or more variables. Hypotheses are always in declarative
sentence form and they relate, either generally or specifically
variable and variables.”
• Hypothesis - A theory when it is stated as a testable proposition.
M. Verma, “A theory when stated as a testable proposition
formally and clearly and subjected to empirical or experimental
verification is known as a hypothesis.”
• Hypothesis -A testable proposition or assumption.
George, J. Mouly defines that, “Hypothesis is an assumption or
proposition whose testability is to be tested on the basis of the
computability of its implications with empirical evidence with
previous knowledge.”
• Hypothesis - Tentative relationship of two or more variables
either normative or casual:
8
Independent Variable: This is the variable which is not influenced
by other variable rather we can say this variable standalone which can
have a quality of influencing others.
Independent variable is also called as Predictor variable (Input).
Example:
Consider the same example as student score in the examination.
Generally, student score will depend on various factors like hours of
study, attendance etc. So, the time spent by the student to prepare for
examination can be considered as independent variable.
To simplify,
Task T: Determine the value of EnjoySport for every given day based
on the values of the day’s qualities.
9
The total proportion of days (EnjoySport) accurately anticipated is
the performance metric P.
Experience E: A collection of days with pre-determined labels
(EnjoySport: Yes/No).
Here the concept = < Sky, Air Temp, Humidity, Wind, Forecast>.
11
With three potential values for the property Sky and two for AirTemp,
Humidity, Wind, Water, and Forecast, the instance space X contains
precisely,
Task T: Determine the value of EnjoySport for every given day based
on the values of the day’s qualities.
12
Sky Air temp Humidity Wind Water Forecast EnjoySport
The question is how many and which examples are classed as positive
by each of these theories (i.e., satisfy these hypotheses). Only
example 4 is satisfactory for h1, however, both examples 3 and 4 are
satisfactory and categorized as positive for h2.
What is the reason behind this? What makes these two hypotheses so
different? The solution is found in the rigor with which each of these
theories imposes limits. As you can see, h1 places more restrictions
on you than h2! Naturally, h2 can categorize more good cases than
h1! In this case, we may really assert the following:
“If an example meets h1, it will almost certainly meet h2, but not the
other way around.”
This is due to the fact that h2 is more general than h1. This may be
seen in the following example: h2 has a wider range of choices than
h1. If an instance has the following values:< Rainy, Freezing,
Strong>, h2 will classify it as positive, but h1 will not be fulfilled.
13
However, if h1 identifies an occurrence as positive, such as <Rainy,
Warm, Strong>, h2 will almost certainly categorise it as positive as
well.
Definition:
Let hj and hk be boolean-valued functions that are defined over X. If
and only if, hj is more general than or equal to hk (written hj >=g hk).
14
A handful of the key algorithms that may be used to explore the
hypothesis space, H, by making use of the g operation. Finding-S is
the name of the method, with S standing for specific and implying
that the purpose is to identify the most particular hypothesis.
We can observe that all the occurrences that fulfill both h1 and h3
also satisfy h2, thus we can conclude that:
h2 ≥g. h1 and h3 are two different types of h2 g h1 and h3.
Hypothesis:
It is usually represented with an ‘h’. In supervised machine learning,
a hypothesis is a function that best characterizes the target.
Find-S:
The find-S algorithm is a machine learning concept learning
algorithm. The find-S technique identifies the hypothesis that best
matches all of the positive cases. The find-S algorithm considers only
positive cases.
Representations:
• The most specific hypothesis is represented using ϕ.
• The most general hypothesis is represented using ?.
If the value is equal then we’ll use the same value for the attribute in
our hypothesis and move to another attribute. If the value of the
attribute is not equal to that of the value in our specific hypothesis
then change the value of our attribute in a specific hypothesis to the
most general hypothesis (?).
After we’ve completed all of the training examples, we’ll have a final
hypothesis that we can use to categorize the new ones.
Consider the following data set, which contains information about the
best day for a person to enjoy their preferred sport.
Now initializing the value of the hypothesis for all attributes with the
most specific one.
h0 = < ϕ, ϕ, ϕ, ϕ, ϕ, ϕ>
18
Consider example 1, The attribute values are < Sunny, Warm,
Normal, Strong, Warm, Same>. Since its target class(EnjoySport)
value is yes, it is considered as a positive example.
Now, We can see that our first hypothesis is more specific, and we
must generalize it in this case. As a result, the hypothesis is:
h1 = < Sunny, Warm, Normal, Strong, Warm, Same>
The second training example (also positive in this case) compels the
algorithm to generalize h further, this time by replacing any attribute
value in h that is not met by the new example with a “?”.
The attribute values are < Sunny, Warm, High, Strong, Warm, Same>
Consider example 3, The attribute values are < Rainy, Cold, High,
Strong, Warm, Change>. But since the target class value is No, it is
considered as a negative example.
h3 = < Sunny, Warm, ?, Strong, Warm, Same > (Same as that of h2)
FIND-S will always return the most specific hypothesis inside H that
matches the positive training instances.
20
In the above example, We have two hypotheses from H in the case
above, both of which are consistent with the training dataset.
The examples are introduced one by one, with each one potentially
shrinking the version space by deleting assumptions that contradict
the example. For each new case, the candidate elimination method
updates the general and particular boundaries.
21
To understand the algorithm better, let us have a look at some
terminologies and what it means.
Specific Hypothesis:
If a hypothesis, h, covers none of the negative cases and there is no
other hypothesis, h′, that covers none of the negative examples, then
h is strictly more general than h′, then h is said to be the most specific
hypothesis.
General Hypothesis:
In general, a hypothesis is an explanation for anything. The general
hypothesis explains the relationship between the key variables in
general. I want to watch Avengers, for example, is a general
hypothesis for selecting a movie.
G = < ‘?’, ‘?’, ‘?’, …..’?’>
Representations:
• The most specific hypothesis is represented using ϕ.
• The most general hypothesis is represented using ?.
22
Although the FIND-S algorithm outputs a hypothesis from H, that is
consistent with the training examples, this is just one of many
hypotheses from H that might fit the training data equally well.
Candidate Elimination:
Unlike Find-S algorithm, the Candidate Elimination algorithm
considers not just positive but negative samples as well. It relies on
the concept of version space.
At the end of the algorithm, we get both specific and general
hypotheses as our final solution.
For a positive example, we move from the most specific hypothesis
to the most general hypothesis.
For a negative example, we move from the most general hypothesis
to the most specific hypothesis.
23
Let’s have a look at an example to see how the Candidate Elimination
Algorithm works.
2. When the first training example is supplied (in this case, a positive
example), the Candidate Elimination method evaluates the S boundary
and determines that it is too specific, failing to cover the positive example.
24
As a result, the hypothesis in the G border must be specialized until
it appropriately categorizes this new negative case.
G3 = < <‘Sunny’, ?, ?, ?, ?, ?>, <?, ‘warm’, ?, ?, ?, ?>, <?, ?, ?, ?,
?, ?>, <?, ?, ?, ?, ?, ?>, <?, ?, ?, ?, ?, ?>, <?, ?, ?, ?, ?, ‘same’> >
S3 = S2 = < ‘Sunny’, ‘warm’, ‘?’, ‘strong’, ‘warm ‘, ‘same’>
The above diagram depicts the whole version space, including the
hypotheses bounded by S4 and G4. The order in which the training
examples are given has no impact on the learned version space.
The final hypothesis is,
G = <[‘Sunny’, ?, ?, ?, ?, ?>, <?, ‘warm’, ?, ?, ?, ?>>
S = <‘Sunny’, ‘warm’, ?, ‘strong’, ?, ?>
25
Inductive Learning:
There are different ways learning can be accomplished in a computer
program. Inductive learning is one of the method.
Inductive learning: In this method computer is feeded with the
labelled data (solved examples). A learning algorithm is used go
through the given data set. This learning algorithm with the help of
these solved examples (labelled data) produces a machine learning
model. This model in turn can take unlabelled data and should be able
to produce labelled data with significant accuracy. Figure shown
below shows a schematic of inductive learning.
We are given input samples (x) and output samples (f(x)) in the
context of inductive learning, and the objective is to estimate the
function (f). The goal is to generalize from the samples and map such
that the output may be estimated for fresh samples in the future.
For a given dataset a regression model gives out a real number. For
example a model which gives temperature as its result.
Binary Classification
For a given dataset a binary classification model gives out one of the two
labels. For example a model can take a image as input and assign one of
the two labels such as 'Cat' and 'Not cat'. In a binary model a negation of
one label must imply that given dataset is from another label. True/false
is another goof example but 'cat' and 'not dog' is not a good binary
27
classification.
Multiclass classification
If there are more than two levels that the model can associate a given
example with, this would be multiclass classification. Examples can be a
model to predict weather or to find color.
Ranking
Given a preference rule, the algorithm should give a sorted order. Such
examples include movie recommendation, google searches, names on
facebook. Basically the model will take a unsorted dataset and sort it
based on the rules of preference. Now buying recommendations on
28
Amazon can also be made using statistics, like how many people bought
one item after the other. The point here is, in case of using statistics
someone will have to make a filter which analyzes all the data and gives
out a preference, whereas in case of machine learing its the machine which
does this job for us.
29
• Problems where there is no human expert. If people do not know the
answer they cannot write a program to solve it. These are areas of true
discovery.
• Humans can perform the task but no one can describe how to do it.
There are problems where humans can do things that computer cannot do
or do well. Examples include riding a bike or driving a car.
• Problems where the desired function changes frequently. Humans
could describe it and they could write a program to do it, but the problem
changes too often. It is not cost effective. Examples include the stock
market.
• Problems where each user needs a custom function. It is not cost
effective to write a custom program for each user. Example is
recommendations of movies or books on Netflix or Amazon.
30
Two perspectives on inductive learning:
• Learning is the removal of uncertainty. Having data removes some
uncertainty. Selecting a class of hypotheses we are removing
more uncertainty.
• Learning is guessing a good and small hypothesis class. It requires
guessing. We don’t know the solution we must use a trial and error
process. If you knew the domain with certainty, you don’t need learning.
But we are not guessing in the dark.
You could be wrong.
• Our prior knowledge could be wrong.
• Our guess of the hypothesis class could be wrong.
In practice we start with a small hypothesis class and slowly grow the
hypothesis class until we get a good result.
31
Some of the fundamental questions for inductive reference are,
• What happens if the target concept isn’t in the hypothesis space?
• Is it possible to avoid this problem by adopting a hypothesis space
that contains all potential hypotheses?
• What effect does the size of the hypothesis space have on the
algorithm’s capacity to generalize to unseen instances?
• What effect does the size of the hypothesis space have on the
number of training instances required?
The bias in the answer based on algorithm used to make the model is
called inductive bias. For example lets say we provide few people with
pictures of birds and animals and then ask them to label them into two
groups. Now some people can ask whether the oraganism can fly or not
and then label based on that, someone can also ask the question whether
the organism is mammal or not and can end up with different labelling.
This kind of discrepancy in result is a case of inductive bias.
The idea of inductive bias is to let the learner generalize beyond the
observed training examples to deduce new examples.
‘ > ’ -> Inductively inferred from.
For example,
x > y means y is inductively deduced from x.
34
Afterwards, we decide which of these properties are relevant for the
particular things we want to deal with, and which of them have to be
fulfilled or not to be fulfilled, respectively.
“What is a chair?”
by deciding which of the properties are relevant for the concept “chair,”
and which of them have to be fulfilled or not to be fulfilled, respectively.
Hence, we obtain:
35
36