0% found this document useful (0 votes)
8 views

Unit 1 Machine learning aktu

The document provides an overview of learning in the context of machine learning, detailing the concept learning task, types of learning, and the process of designing a learning system. It emphasizes the importance of well-defined learning problems, the components involved, and the challenges faced in machine learning, such as data quality, overfitting, and ethical concerns. Additionally, it introduces algorithms like Find-S for concept learning and outlines the steps involved in machine learning model development.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Unit 1 Machine learning aktu

The document provides an overview of learning in the context of machine learning, detailing the concept learning task, types of learning, and the process of designing a learning system. It emphasizes the importance of well-defined learning problems, the components involved, and the challenges faced in machine learning, such as data quality, overfitting, and ethical concerns. Additionally, it introduces algorithms like Find-S for concept learning and outlines the steps involved in machine learning model development.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Unit 1

INTRODUCTION – Well defined learning problems, Designing a Learning System, Issues in


Machine Learning; THE CONCEPT LEARNING TASK - General-to-specific ordering of
hypotheses, Find-S, List then eliminate algorithm, Candidate elimination algorithm, Inductive
bias.

Learning, in a broad sense, refers to the process of acquiring new knowledge, skills, behaviours, or
understanding through experience, study, or teaching. It is a fundamental aspect of human and animal
behaviour, allowing individuals to adapt to their environment and improve their ability to perform
tasks or respond to situations.
Key Aspects of Learning
1. Acquisition of Knowledge or Skills:
o Learning involves the intake of information, concepts, or skills, which can be through
formal education, practice, observation, or even trial and error.
2. Experience and Adaptation:
o Learning is often driven by experiences. As individuals encounter new situations, they
adapt their understanding or behavior based on these experiences. For instance,
learning to ride a bicycle involves adapting to the balance and coordination required
through practice.
3. Retention:
o Learning also involves the ability to retain information over time. This retention allows
individuals to apply previously acquired knowledge or skills to future situations.
4. Behavioral Change:
o Learning can result in changes in behavior. For example, a person who learns the
dangers of touching a hot surface may avoid doing so in the future.
5. Cognitive and Emotional Growth:
o Beyond just acquiring facts or skills, learning also contributes to cognitive and
emotional development, helping individuals understand the world better, solve
problems, and interact effectively with others.
Types of Learning
1. Classical Conditioning
• Definition: A type of associative learning where a neutral stimulus becomes associated with a
meaningful stimulus, leading to a conditioned response.
• Example: Pavlov’s dogs, who learned to salivate at the sound of a bell because they associated
it with food.
2. Operant Conditioning
• Definition: Learning through reinforcement (rewards) and punishment. An organism learns to
associate behaviours with their consequences.
• Example: A rat learning to press a lever to receive food (positive reinforcement) or avoid a
shock (negative reinforcement).
3. Observational Learning (Social Learning)
• Definition: Learning by observing and imitating the behaviour of others.
• Example: A child learning to tie their shoes by watching a parent or sibling.
4. Cognitive Learning
• Definition: Involves the acquisition of knowledge and skills through mental processes such as
thinking, understanding, and problem-solving.
• Example: Learning mathematical concepts through logical reasoning and problem-solving
activities.
5. Implicit Learning
• Definition: Learning that occurs unconsciously and automatically, without the learner being
aware of what they have learned.
• Example: Picking up grammar rules of a language without explicit instruction, simply by
being exposed to it.
6. Explicit Learning
• Definition: Learning that involves conscious awareness of what is being learned, typically
through direct instruction and deliberate practice.
• Example: Learning to drive a car by taking driving lessons and consciously practicing the
skills needed.
7. Experiential Learning
• Definition: Learning through experience, often involving hands-on or practical activities that
allow learners to apply knowledge in real-world situations.
• Example: Medical students learning surgery techniques by practicing on simulations or
cadavers.
8. Machine Learning (Artificial Learning)
• Definition: A type of learning used in artificial intelligence, where algorithms enable machines
to learn from data and make decisions or predictions based on that data.
• Example: A computer program learning to recognize patterns in images to identify objects,
such as facial recognition software.

Well-Defined Learning Problems


A well-defined learning problem in the context of machine learning refers to a task where the goal,
inputs, outputs, and performance measures are clearly specified. Such problems are structured in a
way that allows a machine learning model to be trained effectively, evaluated objectively, and applied
consistently.

According to Tom Mitchell “A computer program is said to learn from experience E with respect to
some class of tasks T and performance measure P, if its performance in tasks T, as measured by P,
improves with experience E”.
Components of a Well-Defined Learning Problem
1. Task (T):
o The specific task that the machine learning system is intended to perform. This could
be anything from classifying emails as spam or not spam to predicting the price of a
house.
2. Performance Measure (P):
o The criterion used to evaluate how well the model is performing the task. Common
performance measures include accuracy, precision, recall, F1 score, and mean squared
error (MSE).
3. Experience (E):
o The data or experiences that the model uses to learn. This usually involves a training
dataset, which the model uses to learn patterns and relationships that help it perform
the task.

A well-defined learning problem is typically expressed in the form: Given experience E, improve
performance P on task T.

Examples of Well-Defined Learning Problems


1. Email Spam Detection
o Task (T): Classify emails as spam or not spam.
o Performance Measure (P): Accuracy or F1 score of the classification.
oExperience (E): A dataset of labeled emails, where each email is marked as spam or
not spam.
2. House Price Prediction
o Task (T): Predict the price of a house based on features such as location, size, number
of bedrooms, etc.
o Performance Measure (P): Mean squared error (MSE) between the predicted prices
and the actual prices.
o Experience (E): A dataset of houses with features and corresponding prices.

Design a Learning System


According to Arthur Samuel “Machine Learning enables a Machine to Automatically learn from Data, Improve
performance from an Experience and predict things without explicitly programmed.”
Designing a learning system, particularly in the context of machine learning, involves creating a system that
can learn from data to perform a specific task. This process requires careful consideration of the problem to be
solved, the data available, the choice of algorithms, and how the system will be evaluated and deployed.

Example: In Driverless Car, the training data is fed to Algorithm like how to Drive Car in Highway, Busy and
Narrow Street with factors like speed limit, parking, stop at signal etc. After that, a Logical and Mathematical
model is created on the basis of that and after that, the car will work according to the logical model. Also, the
more data the data is fed the more efficient output is produced.

1. Define the Problem


o Task Specification: Clearly define the task that the learning system will perform. For
example, "classify emails as spam or not spam" or "predict the price of a house."
o Performance Measure: Determine how the system’s success will be measured. This
could be accuracy, precision, recall, mean squared error, or any other relevant metric.
o Objective: Identify the primary goal of the system, such as maximizing accuracy,
minimizing error, or optimizing some other criterion.
2. Collect and Prepare Data
o Data Collection: Gather the necessary data that the system will learn from. This data
could be historical records, user-generated content, sensor data, etc.
o Data Labeling: If using supervised learning, ensure that the data is properly labeled
with the correct outputs (e.g., spam/not spam labels for emails).
o Data Preprocessing: Clean and preprocess the data, which may include handling
missing values, normalizing data, removing outliers, and encoding categorical
variables.

3. Choose a Model
o Algorithm Selection: Choose an appropriate machine learning algorithm based on the
problem type (classification, regression, clustering, etc.) and the nature of the data.
o Model Complexity: Consider the complexity of the model, balancing between
underfitting (too simple) and overfitting (too complex). Techniques like regularization
can help manage this balance.
4. Train the Model
o Split Data: Divide the data into training, validation, and test sets to evaluate the
model's performance at different stages.
o Model Training: Use the training data to teach the model, adjusting its parameters to
minimize error or maximize the chosen performance metric.
o Hyperparameter Tuning: Optimize the model’s hyperparameters (e.g., learning rate,
number of layers in a neural network) using techniques like grid search or random
search on the validation set.
5. Evaluate the Model
o Validation: Evaluate the model on the validation set to check for overfitting or
underfitting. Make necessary adjustments based on this feedback.
o Testing: Once satisfied with the model, evaluate its performance on the test set to get
an unbiased estimate of how it will perform on new, unseen data.
o Performance Metrics: Calculate and analyze the performance metrics (e.g., accuracy,
precision, recall, F1 score) to assess how well the model meets the problem’s
objectives.

6. Optimize and Iterate


o Model Refinement: Based on the evaluation results, refine the model. This could
involve collecting more data, selecting a different algorithm, or tuning
hyperparameters further.
o Feature Engineering: Improve the model by creating new features, transforming
existing ones, or selecting the most important features.
o Iteration: Machine learning is an iterative process. Repeatedly refine and evaluate the
model until it reaches satisfactory performance.
7. Deploy the Model
o Deployment Strategy: Decide how the model will be deployed in the real world. This
could involve integrating the model into an existing system, creating an API, or
embedding it in a software application.
o Monitoring: Set up monitoring to track the model’s performance over time, ensuring
it continues to perform well as it encounters new data.
o Maintenance: Be prepared to update the model as necessary, retraining it with new
data to handle changes in the problem domain or data distribution.
8. Post-Deployment Considerations
o Feedback Loop: Implement a feedback loop where the system can learn from new
data, improving over time.
o Model Retraining: Periodically retrain the model with new data to adapt to changes
and maintain performance.

Issues in Machine Learning:


Machine learning, while powerful, comes with a variety of challenges and issues that can affect the
development, deployment, and performance of models. Here are some of the key issues in machine
learning:
1. Data Quality and Quantity
• Insufficient Data: Machine learning models require large amounts of data to learn effectively.
Insufficient data can lead to poor model performance.
• Data Imbalance: In many real-world datasets, some classes are underrepresented, leading to
biased models that perform well on the majority class but poorly on the minority class.
• Noisy Data: Data that contains errors, outliers, or irrelevant features can mislead the learning
process, resulting in inaccurate models.
• Data Preprocessing: The process of cleaning, normalizing, and transforming data is crucial
but can be complex and time-consuming.
2. Overfitting and Underfitting
• Overfitting: When a model learns the training data too well, including noise and outliers, it
performs poorly on unseen data. Overfitting results in a model that is too complex for the
problem.
• Underfitting: Occurs when a model is too simple to capture the underlying patterns in the
data, leading to poor performance both on training and test data.
3. Model Interpretability
• Black-Box Models: Many advanced models, such as deep neural networks, are difficult to
interpret. This lack of transparency can be problematic in applications where understanding
the decision-making process is crucial, like healthcare or finance.
• Explainability: Providing explanations for predictions is increasingly important, especially in
regulated industries. Balancing model accuracy with interpretability remains a significant
challenge.

4. Computational Complexity
• Training Time: Some machine learning models, especially deep learning models, require
significant computational resources and time to train, which can be a barrier for many
practitioners.
• Scalability: As datasets grow in size, both in terms of samples and features, the computational
requirements for training and deploying models increase dramatically.
• Resource Constraints: Deploying machine learning models in resource-limited
environments, such as mobile devices or edge computing, requires careful optimization.
5. Generalization
• Domain Adaptation: Models trained on data from one domain may not generalize well to
another domain, a problem known as domain shift. This issue is common when there are
differences in the data distribution between the training and test datasets.
• Transfer Learning: While transfer learning aims to apply knowledge from one domain to
another, it’s not always straightforward and can result in performance degradation if the
domains are too different.
6. Ethical and Privacy Concerns
• Data Privacy: Machine learning models often require large amounts of personal data, raising
concerns about data privacy and security. Ensuring compliance with regulations like GDPR is
crucial.
• Ethical Use: The potential misuse of machine learning models, such as in surveillance or
autonomous weapons, raises significant ethical questions.

Concept Learning
In machine learning (ML), concept learning refers to the ability of an algorithm to learn a generalized
concept from given examples. It involves identifying patterns and categories from data so the system
can correctly classify new, unseen examples. In this context, concepts are usually represented by a set
of features or attributes, and the goal is to infer a model that defines a category or class from these
features.

Steps in Concept Learning:


1. Input Data: The algorithm is provided with training examples consisting of features and their
corresponding labels (positive or negative instances of a concept).
2. Hypothesis Space: This is the set of all possible hypotheses that the model can learn. A
hypothesis defines a possible concept, i.e., a mapping from input features to a classification.
3. Learning: The algorithm searches through the hypothesis space to find the hypothesis that
best fits the training data. The objective is to find a hypothesis that generalizes well to unseen
data.
4. Evaluation: After learning, the algorithm is evaluated on test data to assess its performance in
terms of accuracy, precision, recall, etc.

Concept Learning Example:


• Task: Learn the concept of "fruit" from a dataset of objects.
o Features: Shape, color, size, taste.
o Examples: Apples (positive instances), stones (negative instances).
o Hypothesis: The system would generate rules like "If the object is round and sweet, it
is likely a fruit."

Note: A hypothesis refers to a candidate function or model that the algorithm uses to map inputs
(features) to outputs (predictions). It is essentially an educated guess that the learning algorithm makes
about the relationship between the input data and the target output.

Challenges and Limitations of Concept Learning


While concept learning is a powerful tool in machine learning, it has several challenges and limitations,
including:
Overfitting
Concept learning models can suffer from overfitting, where the model becomes too complex and performs well
on the training data but poorly on new data.
Underfitting
Concept learning models can also suffer from underfitting, where the model is too simple and fails to capture
the underlying concept or pattern in the data.
Data Quality
Concept learning models are sensitive to the quality of the data, and poor data quality can lead to poor model
performance.

Find-S Algorithm:
The find-S algorithm is a basic concept learning algorithm in machine learning. The find-S algorithm finds the
most specific hypothesis that fits all the positive examples. We have to note here that the algorithm considers
only those positive training example. The find-S algorithm starts with the most specific hypothesis and
generalizes this hypothesis each time it fails to classify an observed positive training data. Hence, the Find-S
algorithm moves from the most specific hypothesis to the most general hypothesis.
Important Representation:
1. ? indicates that any value is acceptable for the attribute.
2. specify a single required value (e.g., Cold) for the attribute.
3. Φ indicates that no value is acceptable.
4. The most general hypothesis is represented by: {?, ?, ?, ?, ?, ?}
5. The most specific hypothesis is represented by: {ϕ, ϕ, ϕ, ϕ, ϕ, ϕ}
Steps Involved in Find-S:
1. Start with the most specific hypothesis.
2. h = {ϕ, ϕ, ϕ, ϕ, ϕ, ϕ}
3. Take the next example and if it is negative, then no changes
occur to the hypothesis.
4. If the example is positive and we find that our initial
hypothesis is too specific then we update our current
hypothesis to a general condition.
5. Keep repeating the above steps till all the training examples
are complete.
6. After we have completed all the training examples we will
have the final hypothesis when can use to classify the new
examples.
Example:
Consider the following data set having the data about which particular seeds are poisonous.

First, we consider the hypothesis to be a more specific hypothesis. Hence, our hypothesis would be:
h = {ϕ, ϕ, ϕ, ϕ, ϕ, ϕ}
Consider example 1:
The data in example 1 is {GREEN, HARD, NO, WRINKLED}. We see that our initial hypothesis is
more specific and we have to generalize it for this example. Hence, the hypothesis becomes:
h = {GREEN, HARD, NO, WRINKLED}

Consider example 2:
Here we see that this example has a negative outcome. Hence, we neglect this example and our
hypothesis remains the same.
h = {GREEN, HARD, NO, WRINKLED}
Consider example 3:
Here we see that this example has a negative outcome. Hence, we neglect this example and our
hypothesis remains the same.
h = {GREEN, HARD, NO, WRINKLED}

Consider example 3:
Here we see that this example has a negative outcome. Hence, we neglect this example and our
hypothesis remains the same.
h = {GREEN, HARD, NO, WRINKLED}

Consider example 4:
The data present in example 4 is {ORANGE, HARD, NO, WRINKLED}. We compare every single
attribute with the initial data and if any mismatch is found we replace that particular attribute with a
general case (”?”). After doing the process the hypothesis becomes:
h = {?, HARD, NO, WRINKLED }

Consider example 5:
The data present in example 5 is {GREEN, SOFT, YES, SMOOTH}. We compare every single
attribute with the initial data and if any mismatch is found we replace that particular attribute with a
general case (”?”). After doing the process the hypothesis becomes:
h = {?, ?, ?, ? }

Since we have reached a point where all the attributes in our hypothesis have the general condition,
example 6 and example 7 would result in the same hypothesizes with all general attributes.
h = {?, ?, ?, ?}
Hence, for the given data the final hypothesis would be:
Final Hyposthesis: h = {?, ?, ?, ? }

Limitations of Find-S Algorithm


There are a few limitations of the Find-S algorithm listed down below:
1. There is no way to determine if the hypothesis is consistent throughout the data.
2. Inconsistent training sets can actually mislead the Find-S algorithm, since it ignores the
negative examples.
3. Find-S algorithm does not provide a backtracking technique to determine the best possible
changes that could be done to improve the resulting hypothesis.

The List-Then-Eliminate Algorithm


The List-Then-Eliminate Algorithm is a simple algorithm used in machine learning, particularly in
the context of version space learning. This algorithm works by maintaining a set of all hypotheses
that are consistent with the training data (the version space), and as more examples are observed, it
eliminates any hypotheses that are inconsistent with the new examples.

To understand it from scratch let’s have a look at all the terminologies involved,
Hypothesis:
It is usually represented with an ‘h’. In supervised machine learning, a hypothesis is a function that
best characterizes the target.

Specific Hypothesis:
If a hypothesis, h, covers none of the negative cases and there is no other hypothesis, h′, that covers
none of the negative examples, then h is strictly more general than h′, then h is said to be the most
specific hypothesis.

Before understanding version space, let’s first have a look at the formal definition for ‘Consistent’ =>

If and only if h(x) = c(x) for each example (x, c(x)) in D, a hypothesis h is consistent with a collection
of training examples D.
Let’s take our Enjoy Sport example yet again, to understand what Consistent Hypothesis means better,

Sky Air Temp Humidity Wind Water Forecast Enjoy Sport


Sunny Warm Normal Strong Warm Same Yes
Sunny Warm High Strong Warm Same Yes
Rainy Cold High Strong Warm Change No
Sunny Warm High Strong Cool Change Yes

Here, consider a hypothesis, h1 = <?, ?, ?, Strong, ?, ?>.

But, h(x)!= c(x) in the case of training example (3). As a result, hypothesis h1 is not consistent with
the training data set.

Whereas consider a hypothesis, h2 = < ?, Warm, ?, Strong, ?, ?>.


h(x) = c(x) is true in all of the training instances. As a result, hypothesis h2 is consistent with the
training data set.

Version Space:
With regard to hypothesis space H and training examples D, the version space, denoted as VSH,D, is
the subset of hypotheses from H that are consistent with the training instances in D.

In the above example, We have two hypotheses from H in the case above, both of which are consistent
with the training dataset.

h1=< Sunny, Warm, ?, Strong, ?, ?> and


h2=< ?, Warm, ?, Strong, ?, ?>

As a result, the collection of hypotheses h1, h2 is referred to as a Version Space.

List – Then – Eliminate:


The LIST-THEN-ELIMINATE method first populates the version space with all hypotheses in H, then
discards any hypothesis that contradicts any training example.

As more instances are observed, the version space of candidate hypotheses reduces, until ideally just
one hypothesis exists that is compatible with all of the observed cases.

Inductive bias
Inductive bias is a fundamental concept in machine learning and refers to the set of assumptions a
learning algorithm makes to generalize from training data to unseen data. Essentially, it's the prior
knowledge that guides the algorithm in making predictions when it encounters new, unseen instances.

Importance of Inductive Bias


Inductive bias is crucial in machine learning as it helps algorithms generalize from limited training
data to unseen data. Without a well-defined inductive bias, algorithms may struggle to make accurate
predictions or may overfit the training data, leading to poor performance on new data.
Understanding the inductive bias of an algorithm is essential for model selection, as different biases
may be more suitable for different types of data or tasks. It also provides insights into how the
algorithm is learning and what assumptions it is making about the data, which can aid in interpreting
its predictions and results.

You might also like